Population Research ›› 2026, Vol. 50 ›› Issue (3): 112-128.

• Population and Society • Previous Articles    

The Dynamic Mechanism, Non-linear Logic, and Systemic Characteristics of Migrants' Social Integration: An Empirical Analysis Based on Interpretable Machine Learning Algorithms

Tang Jie, Jiang Ziheng   

  • Published:2026-05-29 Online:2026-05-29
  • About Author:Tang Jie is Professor, School of Smart Governance, School of Public Administration and Policy, Renmin University of China; Jiang Ziheng (Corresponding Author) is PhD Candidate, School of Public Administration and Policy, Renmin University of China. Email:ziheng@ruc.edu.cn

流动人口社会融入的动力机制、非线性逻辑与系统特征——基于可解释机器学习算法的实证分析

唐杰, 蒋子恒   

  • 作者简介:唐杰,中国人民大学智慧治理学院、公共管理学院教授;蒋子恒(通讯作者),中国人民大学公共管理学院博士研究生。电子邮箱:ziheng@ruc.edu.cn
  • 基金资助:
    本文为中国人民大学公共管理学院研究生科学研究基金项目的阶段性成果。

Abstract: Against the background of China's people-centered new-type urbanization, a central policy challenge is how to promote the long-term social incorporation of internal migrants. This issue is especially salient where access to welfare, housing, and local public services remains uneven. Existing studies have identified institutional, social, and individual determinants of migrant incorporation, but have focused mainly on integration levels, linear relationships, and group comparisons. Less is known about how key factors operate through non-linear effects, threshold changes, and interaction mechanisms, and whether different dimensions of integration follow different pathways. To address these questions, this study examines the dynamic mechanisms, non-linear logic, and systemic characteristics of migrant social integration in China.

The analysis uses data from the resident questionnaires of the “Jiexiang Zhongguo” Urban Survey project organized by Renmin University of China in 2023 and 2024. After screening resident-level questionnaires, the study focuses on internal migrants who reside in destination communities but do not hold local hukou, yielding 1839 observations from 31 provincial-level units and 300 cities. The study employs CatBoost to model non-linear relationships and handle missing values, and applies SHAP values to interpret variable importance, marginal effects, and interaction patterns. Recursive feature elimination with cross-validation retains 21 explanatory variables. The benchmark CatBoost model reports an F1 score of 0.75513, an accuracy of 0.75989, and a recall of 0.76766.

The results show that migrant social integration is a composite process shaped jointly by institutional support, residential conditions, spatial mobility, time allocation, and digital participation. Community service satisfaction, insurance coverage, housing property rights, internet use frequency, and living arrangements constitute the main drivers, whereas economic gradient, migration distance, working hours, and destination-city hierarchy constitute the principal constraints. The nine most important variables account for 59.14% of the total contribution. These factors display threshold effects, non-linear marginal changes, and interaction patterns characterized by buffering, substitution, and reinforcement. Interaction analysis indicates that internet use, community services, and insurance coverage can mitigate disadvantages associated with migration distance and economic gradient; that positive conditions may substitute for one another when one support is already sufficient; and that long-distance mobility and long working hours can become more restrictive under certain conditions. Further analysis indicates that structural integration places greater weight on institutional acquisition, organizational participation, and social embeddedness, whereas cultural integration is more closely associated with identity, belonging, subjective acceptance, community service satisfaction, residential stability, and everyday living environments. This distinction shows that “entering urban society” and “identifying with urban society” are not identical processes.

Rather than treating social integration as a single, linearly determined outcome, the analysis specifies its internal differentiation and traces how multiple factors interact under certain conditions. At the theoretical level, it clarifies the internal structure of migrant social integration and distinguishes structural incorporation from cultural incorporation. At the practical level, the findings indicate that promoting fuller migrant incorporation requires coordinated community-based public services, portable social insurance, housing support, digital inclusion, and labor protection, while expanding neighborhood interaction and community participation so that migrants can move from “entering urban life” toward “identifying with urban society.”

Keywords: Domestic Migrants, Social Integration, Citizenization, Machine Learning

摘要: 本文基于2023年和2024年“街巷中国”城市调查项目居民问卷调查数据,利用CatBoost机器学习算法与SHAP解释方法,解构流动人口社会融入的动力机制、非线性逻辑与系统特征。研究发现,社会融入是正向保障驱动与负向成本制约的系统博弈过程,社区服务满意度、保险覆盖水平、住房产权性质、网络使用频率和居住安排是核心驱动力,而经济梯度、迁移距离、工作时长和迁入地城市等级是主要约束,这些因素的影响具有明显的非线性、阈值性和交互性特征,不同因素之间存在缓冲、替代与强化效应。进一步分析发现,社会融入存在内部结构分野,即结构性融入更强调制度获取和社会关系嵌入,文化性融入更强调身份认同和主观接纳。上述发现有助于深化对流动人口社会融入动力机制、系统特征与内部结构的理解,可以为深入推进以人为本的新型城镇化战略提供更为精确的参考信息。

关键词: 流动人口, 社会融入, 市民化, 机器学习