|
|
Sample Structure and Methodological Pitfalls: A Comparative Analysis Based on Large-Scale Social Survey Data in China
Liu Wenbo, Zhou Hao
Population Research
2026, 50 (1):
121-140.
Understanding the world requires unbiased and valid empirical knowledge. Numerous studies on the same topic, employing different survey data, often produce divergent analytical results and even contradictory conclusions, which undermines the effective testing of theoretical reliability and applicability. However, existing studies predominantly focus on refining statistical methods while overlooking foundational issues such as sample representativeness. There is also a scarcity of systematic examinations into the sample structures of widely used large-scale social surveys and their impact on statistical findings.
To address this gap, this study draws on six most extensively used national large-scale social surveys among Chinese scholars. It compares their sampling designs and empirically investigates the similarities and differences in their sample structures. Using a consistent model specification, this study investigates the impact of deviations in sample structure on statistical analysis results, and reveals the underlying logic by which sample structure influences statistical inference.
The main findings are as follows. First, although almost all surveys employ a multi-stage, stratified Probability Proportional to Size (PPS) random sampling method, they exhibit significant differences in sampling frame coverage, stratification principles, the sampling methods and quantities of sampling units at each stage, and within-household sampling procedures. Second, notable disparities exist in the distributions of key demographic variables across the surveys. Moreover, each survey's sample structure deviates to some extent from that of the 2015 National 1% Population Sample Survey. Third, differences in sample structure lead to variations in statistical results. Under identical models, analyses based on different survey data yield both a consensus component reflecting shared social realities and significant discrepancies in the significance and direction of effects for certain variables. Fourth, adjustments in population definitions, weighting schemes, variable selection, and operationalization alter the joint distribution of variables within a sample, thereby significantly affecting statistical outcomes. When sample structures differ initially, such adjustments may further amplify discrepancies in results across different survey datasets. Fifth, the foundational role of sample structure in the methodology of statistical inference must be fully acknowledged.
Based on these findings, the study recommends that researchers should meticulously review survey technical documentation,prudently select appropriate survey data based on research objectives, appropriately address data missingness and weighting, prioritize robustness checks of analytical results, and thoroughly evaluate or explain the sample representativeness of the survey data used. Survey institutions, on the other hand, should provide more detailed weighting information and comprehensive technical documentation to enable researchers to use the data more appropriately.
The primary contributions of this study are as follows. (1) It employs empirical methods to systematically examine the sample structures of six large-scale social surveys and the impact of sample structure deviations on statistical results, revealing methodological pitfalls that offer a new perspective for understanding the contradictory conclusions drawn from different datasets in existing literature. (2) Theoretically, it extends methodological reflection in quantitative research from model specification back to the data-collection stage, broadening scholarly discourse. (3) Practically, it provides empirical guidance for standardizing data usage in quantitative research, thereby enhancing the comparability and robustness of research conclusions.
Reference |
Related Articles |
Metrics
|
|