首都医科大学学报 ›› 2022, Vol. 43 ›› Issue (4): 584-591.doi: 10.3969/j.issn.1006-7795.2022.04.012

• 医学信息学:应用和发展 • 上一篇    下一篇

知识图谱:一种系统性构建因果图的方法

白永梅1,2,3, 孙华鸽4 , 杜建2*   

  1. 1.北京大学医学部医学技术研究院,北京 100191;
    2.北京大学健康医疗大数据国家研究院,北京 100191;
    3.北京大学医学部公共卫生学院,北京 100191;
    4.墨尔本大学数学与统计学院,澳大利亚墨尔本 3010
  • 收稿日期:2022-02-24 出版日期:2022-08-21 发布日期:2022-10-28
  • 基金资助:
    国家自然科学基金面上项目(72074006),中国科协青年人才托举工程项目(2017QNRC001),北京大学医学部人才启动经费项目(BMU2021YJ008).

A systematic approach to constructing DAG from knowledge graph

Bai Yongmei1,2,3, Sun Huage4, Du Jian2*   

  1. 1. Institute of Medical Technology, Peking University Health Science Center, Peking University, Beijing 100191, China;
    2. National Institute of Health Data Science, Peking University, Beijing 100191, China;
    3. School of Public Health, Peking University, Beijing 100191, China;
    4. School of Mathematics and Statistics, The University of Melbourne, Melbourne 3010, Australia
  • Received:2022-02-24 Online:2022-08-21 Published:2022-10-28
  • Contact: *E-mail:dujian@bjmu.edu.cn
  • Supported by:
    This study was supported by National Natural Science Foundation of China (72074006), Young Elite Scientists Sponsorship Program by China Association for Science and Technology (2017QNRC001), Peking University Health Science Center(BMU2021YJ008).

摘要: 因果推断(相对于相关性分析)是基于大数据的观察性研究的主要目标。因果图通过有向无环图(directed acyclic graph, DAG)整合大量先验知识将变量之间复杂的因果关系可视化,已成为合理制定因果推断策略的重要工具。然而目前因果图的构建主要依赖专家知识和经验,亟需从整个医学知识体系的角度进行系统化构建,从现有出版物中进行医学知识提取是系统构建DAG的基础,本文将系统介绍基于美国国立卫生研究院SemMedDB数据库研发的结构化医学知识体系平台。本文尝试从跨学科角度,将因果图定义为研究问题涉及概念(头概念和尾概念)及其所有第三方变量之间的复杂网络,提出系统化构建DAG提供新策略:一是将知识图谱修剪为因果图;二是将基于人群-干预/暴露-对照-结果(population-interventions/exposure-comparisons-outcomes,PI/ECO)框架的证据结论合成为因果图。

关键词: 因果图, 知识图谱, 证据合成, 混杂变量, 中间变量, 对撞变量

Abstract: Causal inference is the primary goal of observational research based on big data, as opposed to correlation analysis. Causal graph often visualize complex causal relationships by integrating large amount of priori knowledge through directed acyclic graph(DAG). The directed acyclic graph has become an important tool for developing causal inference strategies. However, the construction of causal graph for specific research questions currently relies heavily on expert knowledge and local experience. Medical knowledge extraction from existing publications is the basis for the systematic construction of DAG. In this paper, we will systematically introduce the structured medical knowledge system platform developed based on the SemMedDB database of the National Institutes of Health. This study attempts to provide a new strategy for systematically constructing DAG by defining causal graphs from an interdisciplinary perspective as complex networks between the concepts involved in a research problem (head and tail concepts) and all their third-party variables. There are two main approaches for the current systematic construction of causal graphs: (1) Prune the knowledge graph into causal graph; (2) Combine the evidence claims based on population-interventions/exposure-comparisons-outcomes (PI/ECO) framework into causal graphs.

Key words: directed acyclic graph, knowledge graph, evidence synthesis, confounder, mediator, collider

中图分类号: