Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests

Xin Guan, Li Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


Due to its robustness and built-in feature selection capability, random forest is frequently employed in omics studies for biomarker discovery and predictive modeling. However, random forest assumes equal importance of all features, while in reality domain knowledge may justify the prioritization of more relevant features. Furthermore, it has been shown that an antecedent feature selection step can improve the performance of random forest by reducing noises and search space. In this paper, we present a novel Know-guided regularized random forest (Know-GRRF) method that incorporates domain knowledge in a random forest framework for feature selection. Via rigorous simulations, we show that Know-GRRF outperforms existing methods by correctly identifying informative features and improving the accuracy of subsequent predictive models. Know-GRRF is responsive to a wide range of tuning parameters that help to better differentiate candidate features. Know-GRRF is also stable from run to run, making it robust to noises. We further proved that Know-GRRF is a generalized form of existing methods, RRF and GRRF. We applied Known-GRRF to a real world radiation biodosimetry study that uses non-human primate data to discover biomarkers for human applications. By using cross-species correlation as domain knowledge, Know-GRRF was able to identify three gene markers that significantly improved the cross-species prediction accuracy. We implemented Know-GRRF as an R package that is available through the CRAN archive.

Original languageEnglish (US)
Title of host publicationBioinformatics and Biomedical Engineering - 6th International Work-Conference, IWBBIO 2018, Proceedings
EditorsIgnacio Rojas, Francisco Ortuno
PublisherSpringer Verlag
Number of pages12
ISBN (Print)9783319787589
StatePublished - 2018
Event6th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2018 - Granada, Spain
Duration: Apr 25 2018Apr 27 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10814 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other6th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2018


  • Biomarker discovery
  • Domain knowledge
  • Feature selection
  • Regularized random forest

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests'. Together they form a unique fingerprint.

Cite this