Journal of Capital Medical University ›› 2019, Vol. 40 ›› Issue (6): 889-893.doi: 10.3969/j.issn.1006-7795.2019.06.015

Previous Articles     Next Articles

Classification prediction of lung squamous cell carcinoma and lung adenocarcinoma based on XGBoost

Leng Fei, Li Wei   

  1. Genetics and Birth Defects Control Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute, MOE Key Laboratory of Major Diseases in Children, Beijing 100045, China
  • Received:2019-03-19 Online:2019-11-21 Published:2019-12-18
  • Supported by:
    This study was supported by National Key Research and Development Project (2016YFC1000306).

Abstract: Objective To predict lung cancer subtypes of lung squamous cell carcinoma and lung adenocarcinoma,and identify the molecular markers. Methods In this study,mRNA expression of the two different cancer subtypes were studied. Genes with significant expression difference were selected,and extreme gradient boosting(XGBoost) algorithm was used to construct a model to predict subtype classification of lung cancer. Prediction performance was compared with logistic regression classification model and support vector machine (SVM) model. Results The results showed that the prediction accuracy based on XBGoost model was 96.55%,and the area under the curve(AUC) value was 99.04%,which was better than the Logistic regression classification model and support vector machine classification model. At the same time,11 genes were identified as molecular markers for the two subtypes. Conclusion There are significant differences between lung squamous cell carcinoma and lung adenocarcinoma at molecular level,which will assist clinicians in predicting disease subtypes.

Key words: transcriptome, lung squamous cell carcinoma, lung adenocarcinoma, machine learning, disease prediction

CLC Number: