首都医科大学学报 ›› 2019, Vol. 40 ›› Issue (5): 731-737.doi: 10.3969/j.issn.1006-7795.2019.05.013

• 基础研究 • 上一篇    下一篇

基于全血基因表达谱的脑组织基因表达量预测模型的建立

徐文剑, 李巍   

  1. 国家儿童医学中心 首都医科大学附属北京儿童医院 遗传与出生缺陷防治中心;北京市儿科研究所 出生缺陷遗传学研究北京市重点实验室;儿科重大疾病研究教育部重点实验室, 北京 100045
  • 收稿日期:2019-03-13 出版日期:2019-09-21 发布日期:2019-12-16
  • 通讯作者: 李巍 E-mail:liwei@bch.com.cn
  • 基金资助:
    科技部重点研发计划(2016YFC1000306)。

Building a prediction model of brain tissues gene expression based on whole blood gene expression profiles

Xu Wenjian, Li Wei   

  1. Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute;MOE Key Laboratory of Major Diseases in Children;Genetics and Birth Defects Control Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
  • Received:2019-03-13 Online:2019-09-21 Published:2019-12-16
  • Supported by:
    This study was supported by Ministry of Science and Technology of China (2016YFC1000306)。

摘要: 目的 基因表达分析是阐释生物学表型和辅助疾病诊断的有力工具,脑组织的基因表达实验样品采集工作难度大、风险高、成本高,亟需一种能规避脑组织取样的表达谱检测替代方案。方法 以基因型-组织表达数据库(genotype-tissue expression,GTEx)中的脑组织样本配对的全血基因表达谱为输入特征数据,以13个脑组织的基因表达量分别为目标数据,挖掘全血基因表达量与脑组织中任一基因表达量数值的多对多关联关系,进而构建一个基于全血基因表达谱的未取样脑组织中基因表达量的回归预测模型。结果 针对每个基因分别提取包含15个最相关的全血基因表达量特征构成低维度的新特征数据集,构建了13个脑组织所有基因表达量线性回归预测模型。预测模型平均绝对误差为0.406~0.542,均方根误差为0.558~0.941。结论 本研究提出了一种基于全血基因表达量数据的脑组织基因表达量预测模型,证明仅用全血表达谱数据能比较准确地预测出未取样脑组织基因表达量,有望在转录组研究中规避脑组织样本的手术取样,为脑组织相关疾病的基因表达谱研究提供一种备选工具。

关键词: 脑组织, 全血, 基因表达, 预测模型, 特征选择

Abstract: Objective Gene expression analysis is a powerful tool for explaining biological phenotypes and assisting disease diagnosis. It is difficult, risky and expensive to collect the samples of gene expression experiment of brain tissue. It is urgent to find an alternative method to detect the expression profiles of brain tissue from other available samples. Methods Using the whole blood gene expression profiles matched with brain tissue samples from GTEx (Genotype-Tissue Expression) database as input features and 13 brain tissue expression profiles as targets, we mined many-to-many correlations between the gene expression level in whole blood and that of specific brain tissue. Then we construct a predictive regression model of gene expression level of unavailable brain tissue based on the expression level of whole blood gene expression profiles. Results A new low-dimensional feature dataset for each gene in each brain tissue was constructed by extracting 15 most relevant gene expression features from whole blood, and a linear regression prediction model for all gene in 13 brain tissues was constructed. The mean absolute error (MAE) of the prediction model is between 0.406 and 0.542, and the root mean square error (RMSE) is between 0.558 and 0.941. Conclusion A prediction model of gene expression in brain tissues based on whole blood gene expression profile is proposed. It is proved that the gene expression in unsampled brain tissue can be predicted relatively accurately only by using whole blood expression profile data. It is possible that the surgical sampling of brain tissue samples can be avoided in transcriptome research, thus providing an alternative for the study of gene expression profiles of brain tissue-related diseases.

Key words: brain tissue, whole blood, gene expression, prediction model, feature selection

中图分类号: