首都医科大学学报 ›› 2019, Vol. 40 ›› Issue (6): 889-893.doi: 10.3969/j.issn.1006-7795.2019.06.015

• 基础研究 • 上一篇    下一篇

基于XGBoost对肺鳞癌和肺腺癌的分类预测

冷菲, 李巍   

  1. 国家儿童医学中心 首都医科大学附属北京儿童医院 遗传与出生缺陷防治中心 北京市儿科研究所 出生缺陷遗传学研究北京市重点实验室 儿科重大疾病研究教育部重点实验室, 北京 100045
  • 收稿日期:2019-03-19 出版日期:2019-11-21 发布日期:2019-12-18
  • 通讯作者: 李巍 E-mail:liwei@bch.com.cn
  • 基金资助:
    国家重点研发计划(2016YFC1000306)。

Classification prediction of lung squamous cell carcinoma and lung adenocarcinoma based on XGBoost

Leng Fei, Li Wei   

  1. Genetics and Birth Defects Control Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute, MOE Key Laboratory of Major Diseases in Children, Beijing 100045, China
  • Received:2019-03-19 Online:2019-11-21 Published:2019-12-18
  • Supported by:
    This study was supported by National Key Research and Development Project (2016YFC1000306).

摘要: 目的 对肺癌亚型肺鳞状细胞癌(肺鳞癌)和肺腺癌进行预测并找出分子标记。方法 通过研究两种不同癌症亚型中mRNA表达量,选取有差异有统计学意义的mRNA,利用极限梯度增强(extreme gradient boosting,XGBoost)算法构建模型,预测亚型分类,并比较其与逻辑回归分类模型和支持向量机分类模型的预测性能。结果 基于XBGoost模型的预测准确率为96.55%,曲线下面积为99.04%,优于逻辑回归分类模型和支持向量机分类模型。同时,找到11个基因作为两种亚型的分子标记。结论 肺癌两种亚型的在分子层面存在明显差异特征,将辅助临床医生进行疾病亚型预测。

关键词: 转录组, 肺鳞癌, 肺腺癌, 机器学习, 疾病预测

Abstract: Objective To predict lung cancer subtypes of lung squamous cell carcinoma and lung adenocarcinoma,and identify the molecular markers. Methods In this study,mRNA expression of the two different cancer subtypes were studied. Genes with significant expression difference were selected,and extreme gradient boosting(XGBoost) algorithm was used to construct a model to predict subtype classification of lung cancer. Prediction performance was compared with logistic regression classification model and support vector machine (SVM) model. Results The results showed that the prediction accuracy based on XBGoost model was 96.55%,and the area under the curve(AUC) value was 99.04%,which was better than the Logistic regression classification model and support vector machine classification model. At the same time,11 genes were identified as molecular markers for the two subtypes. Conclusion There are significant differences between lung squamous cell carcinoma and lung adenocarcinoma at molecular level,which will assist clinicians in predicting disease subtypes.

Key words: transcriptome, lung squamous cell carcinoma, lung adenocarcinoma, machine learning, disease prediction

中图分类号: