Syllabus
QSSBS, Spring 2025
Course Description
This course aims to equip students with core concepts and practical skills essential for modern quantitative social science research. The curriculum spans from traditional statistical analysis to cutting-edge machine learning methods, and from survey data to text, image, and historical document analysis. Students will develop the ability to systematically investigate complex social science questions using Big Data and AI methodologies.
At the end, students will use at least one of them above to finish an empirical research project or replicate main outcomes from an classical published paper.
Evaluation
Component | Weight |
---|---|
Participation | 10% |
Mid: Group Project Proposal | 40% |
Final: Group project Report for Preliminary Results | 30% |
Final: Group project Report for Final Results | 20% |
Main Textbooks
Imai, Kosuke and Nora Webb Williams, Quantitative Social Science: An Introduction with Tidyverse, 2022. Princeton University Press.
Llaudet, Elena and Kosuke Imai, Data Analysis for Social Science: A Friendly and Practical Introduction,2023. Princeton University Press.
Gábor Békés and Gábor Kézdi, Trevor Hastie and Robert Tibshirani, Data Analysis for Business, Economics, and Policy, Cambridge University Press, 2021.
Supplementary Textbooks
Ismay, Chester and Albert Y. Kim, Statistical Inference via Data Science: A ModernDive into R and the Tidyverse,2024.
Quan Li, Using R for Data Analysis in Social Sciences, Oxford University Press, 2018.
Joshua Angrist and Jorn-Steffen Pischke, Mastering’ Metrics: The Path from Cause to Effect, Princeton press, 2014.
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, 2021.
Computing Softwares: R/RStudio/Python
- All of them are free.
Additional Readings
Pearl, Judea and Dana Mackenzie,The Book of Why,2018.(中译本:《为什么:关于因果关系的新科学,中信出版社。》
Abhijit V. Banerjee & Esther Duflo, Poor Economics A Radical Rethinking of the Way to Fight Global Poverty, 2011.(中译本:《贫穷的本质:我们为什么摆脱不了贫穷》,中信出版社。)
Abhijit V. Banerjee & Esther Duflo, Good Economics for Hard Times, 2019.(中译本: 《好的经济学》,中信出版社。)
Outline
(Preliminary, to be adjusted possibly)
Week | Topic | Key Contents |
---|---|---|
1 | Course Introduction | - What the course is about - Benefits of learning - Learning approach - Introduction to Data in Social Science in the Big Data and AI era |
2 | Software Setup | - R and Python installation - Github/Copilot - Visual Source Code and RStudio - ChatGPT/Claudia/Gemini and other LLMs |
3 | Descriptive Statistics I | - Data types and structures - Data transformation - Missing values and Outliers |
4 | Descriptive Statistics II | - ggplot2 usage - Multidimensional visualization - Theme customization - Color schemes |
5 | Causal Inference I | - Definition of causality - Correlation vs. causation - Potential outcomes - Directed Acyclic Graph |
6 | Causal Inference II | - RCTs introduction - Randomization methods - Sample size calculation - Statistical inference |
7 | Midterm | Group Project Proposal |
8 | Regression Analysis | - OLS Linear regression - Model diagnostics - Variable selection - Interaction effects |
9 | Matching Methods | - Matching basics - Propensity score matching - Sensitivity analysis |
10 | Machine Learning I | - Supervised learning - Model evaluation - Cross-validation - Feature engineering |
11 | Machine Learning II | - Decision trees - Random forests - Support vector machines |
12 | Spatial Data Analysis | - Spatial data types - GIS fundamentals - Mapping techniques |
13 | Text Data Analysis | - Text preprocessing - Topic modeling - Sentiment analysis - LLM applications |
14 | OCR Technology | - OCR Principles - Image preprocessing - Text recognition |
15 | Web Scraping | - Web Scraping basics - Data extraction - Anti-scraping handling |
16 | Final | Group Project Report for Preliminary Results |
Office Hours
Instructor: Zhaopeng Qu
Teaching Assistant: Zhengwu Raoxue