Journal of Capital Medical University ›› 2022, Vol. 43 ›› Issue (4): 610-617.doi: 10.3969/j.issn.1006-7795.2022.04.015

• Medical Informatics:Application and Development • Previous Articles     Next Articles

Interpretable machine learning methods applied in disease risk prediction: a case study of sepsis mortality risk predication

Yang Fengchun, Zheng Si, Li Jiao*   

  1. Medical Intelligent Computing Division, Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
  • Received:2022-02-22 Online:2022-08-21 Published:2022-10-28
  • Contact: *E-mail:li.jiao@imicams.ac.cn

Abstract: Objective To investigate the application of interpretable machine learning methods in disease risk prediction. Methods In this study, we used sepsis mortality risk prediction as a case. A total of 19 903 electronic medical records (EMRs) of intensive care unit (ICU) patients with Sepsis-3 who met the criteria were collected from Medical Information Mart for Intensive Care (MIMIC)-Ⅳ, a public EMR database. Then, the predictive models of sepsis death were constructed using the models with inherent interpretation, decision tree, Logistic regression, and complex model random forest, XGBoost and LightGBM models. The machine learning models were interpreted using global interpretable methods (feature importance, partial dependency plot, individual conditional expectation plot, global surrogate model) and local interpretable methods (local interpretable model-agnostic explanations, Shapley value). These methods were used to explore the risk factors affecting the prognosis of sepsis patients. Results The models with low interpretability [area under curve (AUC) values of LightGBM, random forest, and XGBoost models are 0.913, 0.892 and 0.872, respectively], performed with higher predictive ability than models with intrinsic explanatory ability (AUC values of Logistic regression model and decision tree model are 0.779 and 0.791, respectively). Both the global and local interpretable methods were used to explain the decision process of machine learning model. Conclusion The global interpretable methods were able to explain the responsive trend of models in the whole feature space, while the local interpretable methods were able to explain how decisions were made in particular cases.

Key words: interpretable machine learning, disease prediction, sepsis

CLC Number: