XGBoost-SHAP 肺结节早期识别可解释性框架构建
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

R734.2

基金项目:

成都医学院教育发展基金会科研专项项目(25LHZG-12); 四川省自贡市重点科技计划项目(2024-YGY-01-04);


XGBoost-SHAP-based interpretable framework for the early identification of pulmonary nodules
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
    摘要:

    目的:通过可解释机器学习实现肺结节早期识别与重要变量可视化解释,助力肺癌精准防控与早诊早治。方法:以肺癌高危且完成临床筛查的人群为研究对象,提取其高危评估与影像检查结果;依据《中国肺癌筛查标准(T/CPMA013-2020)》将受检者分为肺结节高危组与低危组;经单因素分析筛选有意义变量作为预测变量,以肺结节分组为因变量,构建XGBoost-SHAP可解释性识别框架,实现肺结节早期识别与结果可视化解释。结果:共纳入644例肺癌高危受检者,其中肺结节高危组199例(30.9%),XGBoost模型识别肺结节的准确度为0.914 6、敏感度为0.758 7、特异度为0.984 3、F1值为0.845 8、AUC为0.974 1。SHAP算法显示,吸烟量更大、暴露于同事/家人吸烟环境、做饭通风频次低、加工类食物摄入多、有石棉和氡等职业暴露、蛋白质和蔬菜水果摄入少、从事体力劳动的受检者肺结节增大风险更高。结论:可解释性框架在肺结节早期识别中效果良好;肺结节大小改变不仅与吸烟习惯、二手烟暴露、油烟暴露、石棉和氡职业暴露等传统危险因素相关,还与受检者膳食习惯有关。

    Abstract:

    Objective: To achieve early identification of pulmonary nodules and visual interpretation of key variables through interpretable machine learning, and to facilitate precise prevention, control, early diagnosis and treatment of lung cancer. Methods: This study enrolled individuals at high risk of lung cancer and completed clinical screening. Their high-risk assessment data and imaging results were extracted. Participants were divided into high-risk and low-risk groups for pulmonary nodules based on China’s Lung Cancer Screening Standard (T/CPMA 013-2020). Variables with differences identified by univariate analysis were used as predictors, with pulmonary nodule grouping as the dependent variable, to construct an interpretable XGBoost-SHAP identification framework for early nodule detection and visual result interpretation. Results: A total of 644 high-risk individuals were included, with 199 (30.9%) in the high-risk pulmonary nodule group. The XGBoost model achieved an accuracy of 0.9146, sensitivity of 0.7587, specificity of 0.9843, F1-score of 0.8458, and AUC of 0.9741 for nodule grouping. SHAP analysis revealed that higher SHAP values—and thus increased risk of nodule enlargement—were associated with greater smoking intensity, exposure to secondhand smoke from colleagues/family, infrequent kitchen ventilation during cooking, excessive intake of processed foods, occupational exposure to asbestos/radon, insufficient intake of protein, fruits and vegetables, and manual labor occupation. Conclusion: The constructed interpretable framework performs well in early pulmonary nodule identification. Changes in nodule size are associated not only with traditional risk factors (e.g., smoking habits, secondhand smoke exposure, cooking fume exposure, occupational asbestos/radon exposure) but also with the participants’ dietary habits.

    参考文献
    相似文献
    引证文献
引用本文

易付良;李刚;刘昕;向茹梅;骆长玲;邓丽春;余秀莲;周厚容;高扬;邹雪娜. XGBoost-SHAP 肺结节早期识别可解释性框架构建[J].川北医学院学报,2026,41(4):422-427.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-05-06
  • 出版日期:
文章二维码