Population Research ›› 2026, Vol. 50 ›› Issue (1): 121-140.

• Constructing China's Independent Knowledge System of Demography: Refining Quantitative Research Methods in Social Sciences • Previous Articles    

Sample Structure and Methodological Pitfalls: A Comparative Analysis Based on Large-Scale Social Survey Data in China

Liu Wenbo, Zhou Hao   

  • Published:2026-01-29 Online:2026-01-29
  • About Author:Liu Wenbo is Tenure-Track Associate Professor,School of Humanities and Social Sciences, Harbin Engineering University; Zhou Hao (Corresponding Author) is Professor, Center for Sociological Research and Development of China, Department of Sociology, Peking University. Email:zhouh@pku.edu.cn

样本结构与方法论陷阱:基于中国大型社会调查数据的比较分析

刘文博, 周皓   

  • 作者简介:刘文博,哈尔滨工程大学人文社会科学学院准聘副教授;周皓(通讯作者),北京大学中国社会与发展研究中心、社会学系教授。电子邮箱:zhouh@pku.edu.cn
  • 基金资助:
    本研究得到教育部人文社会科学重点研究基地重大项目“中国人口长期均衡发展关键问题研究”(22JJD840001)和黑龙江省哲学社会科学研究规划青年项目“黑龙江省人口流动变化趋势及其对人口高质量发展的影响研究”(24SHC004)的支持。

Abstract: Understanding the world requires unbiased and valid empirical knowledge. Numerous studies on the same topic, employing different survey data, often produce divergent analytical results and even contradictory conclusions, which undermines the effective testing of theoretical reliability and applicability. However, existing studies predominantly focus on refining statistical methods while overlooking foundational issues such as sample representativeness. There is also a scarcity of systematic examinations into the sample structures of widely used large-scale social surveys and their impact on statistical findings.

To address this gap, this study draws on six most extensively used national large-scale social surveys among Chinese scholars. It compares their sampling designs and empirically investigates the similarities and differences in their sample structures. Using a consistent model specification, this study investigates the impact of deviations in sample structure on statistical analysis results, and reveals the underlying logic by which sample structure influences statistical inference.

The main findings are as follows. First, although almost all surveys employ a multi-stage, stratified Probability Proportional to Size (PPS) random sampling method, they exhibit significant differences in sampling frame coverage, stratification principles, the sampling methods and quantities of sampling units at each stage, and within-household sampling procedures. Second, notable disparities exist in the distributions of key demographic variables across the surveys. Moreover, each survey's sample structure deviates to some extent from that of the 2015 National 1% Population Sample Survey. Third, differences in sample structure lead to variations in statistical results. Under identical models, analyses based on different survey data yield both a consensus component reflecting shared social realities and significant discrepancies in the significance and direction of effects for certain variables. Fourth, adjustments in population definitions, weighting schemes, variable selection, and operationalization alter the joint distribution of variables within a sample, thereby significantly affecting statistical outcomes. When sample structures differ initially, such adjustments may further amplify discrepancies in results across different survey datasets. Fifth, the foundational role of sample structure in the methodology of statistical inference must be fully acknowledged.

Based on these findings, the study recommends that researchers should meticulously review survey technical documentation,prudently select appropriate survey data based on research objectives, appropriately address data missingness and weighting, prioritize robustness checks of analytical results, and thoroughly evaluate or explain the sample representativeness of the survey data used. Survey institutions, on the other hand, should provide more detailed weighting information and comprehensive technical documentation to enable researchers to use the data more appropriately.

The primary contributions of this study are as follows. (1) It employs empirical methods to systematically examine the sample structures of six large-scale social surveys and the impact of sample structure deviations on statistical results, revealing methodological pitfalls that offer a new perspective for understanding the contradictory conclusions drawn from different datasets in existing literature. (2) Theoretically, it extends methodological reflection in quantitative research from model specification back to the data-collection stage, broadening scholarly discourse. (3) Practically, it provides empirical guidance for standardizing data usage in quantitative research, thereby enhancing the comparability and robustness of research conclusions.

Keywords: Large-Scale Social Survey, Sample Structure, Sampling Design, Methodological Pitfalls

摘要: 认知世界需要基于无偏且有效的经验知识。利用不同调查数据开展同一主题的研究可能得到不同结果,进而影响对社会现实的反映以及理论检验。本文以2015年全国1%人口抽样调查为参照,比较中国6个大型社会调查的抽样设计方案、样本结构及相同模型设定下的统计分析结果。研究发现,各调查的抽样方法相似,但抽样过程存在差异;各调查数据在部分基础特征变量的结构上存在差异,且均与2015年小普查存在偏离;样本结构偏离导致各调查数据在相同模型设定下的统计分析结果存在差异;目标总体限定、权重设置、变量设置及操作化方式的调整均会通过改变变量联合分布而影响统计分析结果。理论分析表明,需要充分认识样本结构在统计推断中的基础性作用。为避免陷入样本结构偏离带来的方法论陷阱,建议研究者仔细阅读调查技术文件、充分考虑抽样设计对分析结果的可能影响。

关键词: 大型社会调查, 样本结构, 抽样设计, 方法论陷阱