Modeling and analysis of disease and risk factors through learning Bayesian networks from observational data

Jing Li, Jianjun Shi, Devin Satz

Research output: Contribution to journalArticlepeer-review

15 Scopus citations


This paper focuses on identification of the relationships between a disease and its potential risk factors using Bayesian networks in an epidemiologic study, with the emphasis on integrating medical domain knowledge and statistical data analysis. An integrated approach is developed to identify the risk factors associated with patients' occupational histories and is demonstrated using real-world data. This approach includes several steps. First, raw data are preprocessed into a format that is acceptable to the learning algorithms of Bayesian networks. Some important considerations are discussed to address the uniqueness of the data and the challenges of the learning. Second, a Bayesian network is learned from the preprocessed data set by integrating medical domain knowledge and generic learning algorithms. Third, the relationships revealed by the Bayesian network are used for risk factor analysis, including identification of a group of people who share certain common characteristics and have a relatively high probability of developing the disease, and prediction of a person's risk of developing the disease given information on his/her occupational history.

Original languageEnglish (US)
Pages (from-to)291-302
Number of pages12
JournalQuality and Reliability Engineering International
Issue number3
StatePublished - Apr 2008


  • Bayesian network
  • Case-control study
  • Causal inference
  • Epidemiology

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Management Science and Operations Research


Dive into the research topics of 'Modeling and analysis of disease and risk factors through learning Bayesian networks from observational data'. Together they form a unique fingerprint.

Cite this