首都医科大学学报 ›› 2022, Vol. 43 ›› Issue (4): 610-617.doi: 10.3969/j.issn.1006-7795.2022.04.015

• 医学信息学:应用和发展 • 上一篇    下一篇

可解释机器学习方法在疾病预测中的应用:脓毒血症患者死亡风险研究

杨丰春, 郑思, 李姣*   

  1. 中国医学科学院/北京协和医学院 医学信息研究所 医学智能计算研究室, 北京100020
  • 收稿日期:2022-02-22 出版日期:2022-08-21 发布日期:2022-10-28

Interpretable machine learning methods applied in disease risk prediction: a case study of sepsis mortality risk predication

Yang Fengchun, Zheng Si, Li Jiao*   

  1. Medical Intelligent Computing Division, Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
  • Received:2022-02-22 Online:2022-08-21 Published:2022-10-28
  • Contact: *E-mail:li.jiao@imicams.ac.cn

摘要: 目的 探索可解释机器学习方法在疾病预测中的应用。方法 本研究以脓毒血症死亡风险预测为例,从重症监护医学数据库(Medical Information Mart for Intensive Care, MIMIC)-Ⅳ中采集符合纳排标准的19 903例脓毒血症(sepsis-3)患者的临床数据,利用决策树、逻辑回归、随机森林、XGBoost、轻量梯度提升机(light gradient boosting machine,LightGBM)模型分别构建脓毒血症死亡预测模型。在此基础上,利用全局可解释方法(特征重要性、部分依赖图、个体条件期望、全局代理模型)和局部可解释方法(局部代理模型和Shapely值)对复杂机器学习模型进行解释,探索影响脓毒血症患者预后的危险因素。结果 解释性差的机器学习模型的预测性能[模型LightGBM、随机森林、XGBoost的曲线下面积(area under curve,AUC)值分别为0.913、0.892、0.872]高于具有内在解释性的模型(逻辑回归模型AUC=0.779,决策树模型AUC=0.791),并利用全局解释性方法、局部可解释性方法两种类型的解释方法对机器学习模型决策过程进行解释。结论 利用全局解释性方法可以解释在整个特征空间内机器学习模型的响应趋势,利用局部可解释性方法可以解释机器学习模型对特定病例的决策过程。

关键词: 可解释机器学习, 疾病预测, 脓毒血症

Abstract: Objective To investigate the application of interpretable machine learning methods in disease risk prediction. Methods In this study, we used sepsis mortality risk prediction as a case. A total of 19 903 electronic medical records (EMRs) of intensive care unit (ICU) patients with Sepsis-3 who met the criteria were collected from Medical Information Mart for Intensive Care (MIMIC)-Ⅳ, a public EMR database. Then, the predictive models of sepsis death were constructed using the models with inherent interpretation, decision tree, Logistic regression, and complex model random forest, XGBoost and LightGBM models. The machine learning models were interpreted using global interpretable methods (feature importance, partial dependency plot, individual conditional expectation plot, global surrogate model) and local interpretable methods (local interpretable model-agnostic explanations, Shapley value). These methods were used to explore the risk factors affecting the prognosis of sepsis patients. Results The models with low interpretability [area under curve (AUC) values of LightGBM, random forest, and XGBoost models are 0.913, 0.892 and 0.872, respectively], performed with higher predictive ability than models with intrinsic explanatory ability (AUC values of Logistic regression model and decision tree model are 0.779 and 0.791, respectively). Both the global and local interpretable methods were used to explain the decision process of machine learning model. Conclusion The global interpretable methods were able to explain the responsive trend of models in the whole feature space, while the local interpretable methods were able to explain how decisions were made in particular cases.

Key words: interpretable machine learning, disease prediction, sepsis

中图分类号: