Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Sample Structure and Methodological Pitfalls: A Comparative Analysis Based on Large-Scale Social Survey Data in China
Liu Wenbo, Zhou Hao
Population Research    2026, 50 (1): 121-140.  
Abstract56)            Save
Understanding the world requires unbiased and valid empirical knowledge. Numerous studies on the same topic, employing different survey data, often produce divergent analytical results and even contradictory conclusions, which undermines the effective testing of theoretical reliability and applicability. However, existing studies predominantly focus on refining statistical methods while overlooking foundational issues such as sample representativeness. There is also a scarcity of systematic examinations into the sample structures of widely used large-scale social surveys and their impact on statistical findings.

To address this gap, this study draws on six most extensively used national large-scale social surveys among Chinese scholars. It compares their sampling designs and empirically investigates the similarities and differences in their sample structures. Using a consistent model specification, this study investigates the impact of deviations in sample structure on statistical analysis results, and reveals the underlying logic by which sample structure influences statistical inference.

The main findings are as follows. First, although almost all surveys employ a multi-stage, stratified Probability Proportional to Size (PPS) random sampling method, they exhibit significant differences in sampling frame coverage, stratification principles, the sampling methods and quantities of sampling units at each stage, and within-household sampling procedures. Second, notable disparities exist in the distributions of key demographic variables across the surveys. Moreover, each survey's sample structure deviates to some extent from that of the 2015 National 1% Population Sample Survey. Third, differences in sample structure lead to variations in statistical results. Under identical models, analyses based on different survey data yield both a consensus component reflecting shared social realities and significant discrepancies in the significance and direction of effects for certain variables. Fourth, adjustments in population definitions, weighting schemes, variable selection, and operationalization alter the joint distribution of variables within a sample, thereby significantly affecting statistical outcomes. When sample structures differ initially, such adjustments may further amplify discrepancies in results across different survey datasets. Fifth, the foundational role of sample structure in the methodology of statistical inference must be fully acknowledged.

Based on these findings, the study recommends that researchers should meticulously review survey technical documentation,prudently select appropriate survey data based on research objectives, appropriately address data missingness and weighting, prioritize robustness checks of analytical results, and thoroughly evaluate or explain the sample representativeness of the survey data used. Survey institutions, on the other hand, should provide more detailed weighting information and comprehensive technical documentation to enable researchers to use the data more appropriately.

The primary contributions of this study are as follows. (1) It employs empirical methods to systematically examine the sample structures of six large-scale social surveys and the impact of sample structure deviations on statistical results, revealing methodological pitfalls that offer a new perspective for understanding the contradictory conclusions drawn from different datasets in existing literature. (2) Theoretically, it extends methodological reflection in quantitative research from model specification back to the data-collection stage, broadening scholarly discourse. (3) Practically, it provides empirical guidance for standardizing data usage in quantitative research, thereby enhancing the comparability and robustness of research conclusions.
Reference | Related Articles | Metrics
Destination Selection Mechanism of Migrants in China
Zhou Hao, Liu Wenbo
Population Research    2022, 46 (1): 37-53.  
Abstract1524)      PDF (16475KB)(372)       Save
Based on the 2017 China Migrants Dynamic Survey and data from relevant statistical yearbooks, this paper uses nested logit model (NLGT) to analyse the destination selection mechanism of floating population in China. This study takes prefecture regions as the basic geographical analysis unit and the flow sample as the research object. The results show significant structural differences between flow sample and stock sample, and heterogeneous effects of the same variables on different samples. The socioeconomic returns that floating population can personally feel play an important role in the destination selection mechanism. The interaction between regional level and individual characteristics shows that regional level characteristics have a heterogeneous effect on the selection mechanism. The paper suggests that focusing on flow or stock samples should be based on the specific study aim of different research questions. Expected socioeconomic return, rather than expected income, is one of the most important factors attracting floating population. The destination selection is a rational and comprehensive decision made by floating population based on individual characteristics and the regional level characteristics of the destination.
Related Articles | Metrics