Syllabus

QSSBS, Spring 2025

Course Description

This course aims to equip students with core concepts and practical skills essential for modern quantitative social science research. The curriculum spans from traditional statistical analysis to cutting-edge machine learning methods, and from survey data to text, image, and historical document analysis. Students will develop the ability to systematically investigate complex social science questions using Big Data and AI methodologies.

At the end, students will use at least one of them above to finish an empirical research project or replicate main outcomes from an classical published paper.

Evaluation

Component Weight
Participation 10%
Mid: Group Project Proposal 40%
Final: Group project Report for Preliminary Results 30%
Final: Group project Report for Final Results 20%

Main Textbooks

Supplementary Textbooks

Computing Softwares: R/RStudio/Python

  • All of them are free.

Additional Readings

  • Pearl, Judea and Dana Mackenzie,The Book of Why,2018.(中译本:《为什么:关于因果关系的新科学,中信出版社。》

  • Abhijit V. Banerjee & Esther Duflo, Poor Economics A Radical Rethinking of the Way to Fight Global Poverty, 2011.(中译本:《贫穷的本质:我们为什么摆脱不了贫穷》,中信出版社。)

  • Abhijit V. Banerjee & Esther Duflo, Good Economics for Hard Times, 2019.(中译本: 《好的经济学》,中信出版社。)

Outline

(Preliminary, to be adjusted possibly)

Week Topic Key Contents
1 Course Introduction - What the course is about
- Benefits of learning
- Learning approach
- Introduction to Data in Social Science in the Big Data and AI era
2 Software Setup - R and Python installation
- Github/Copilot
- Visual Source Code and RStudio
- ChatGPT/Claudia/Gemini and other LLMs
3 Descriptive Statistics I - Data types and structures
- Data transformation
- Missing values and Outliers
4 Descriptive Statistics II - ggplot2 usage
- Multidimensional visualization
- Theme customization
- Color schemes
5 Causal Inference I - Definition of causality
- Correlation vs. causation
- Potential outcomes
- Directed Acyclic Graph
6 Causal Inference II - RCTs introduction
- Randomization methods
- Sample size calculation
- Statistical inference
7 Midterm Group Project Proposal
8 Regression Analysis - OLS Linear regression
- Model diagnostics
- Variable selection
- Interaction effects
9 Matching Methods - Matching basics
- Propensity score matching
- Sensitivity analysis
10 Machine Learning I - Supervised learning
- Model evaluation
- Cross-validation
- Feature engineering
11 Machine Learning II - Decision trees
- Random forests
- Support vector machines
12 Spatial Data Analysis - Spatial data types
- GIS fundamentals
- Mapping techniques
13 Text Data Analysis - Text preprocessing
- Topic modeling
- Sentiment analysis
- LLM applications
14 OCR Technology - OCR Principles
- Image preprocessing
- Text recognition
15 Web Scraping - Web Scraping basics
- Data extraction
- Anti-scraping handling
16 Final Group Project Report for Preliminary Results

Office Hours