Cancer pathways: Automatic extraction, representation, and reasoning in the 'big data' era

Graciela Gonzalez, Chitta Baral, Jeff Kiefer, Seungchan Kim, Jieping Ye

Research output: Chapter in Book/Report/Conference proceedingConference contribution


There has been great interest and research initiatives in the biomedical community around harnessing "big data", including data from the literature, high-throughput gene expression experiments, array CGH and high'throughput siRNA and many other types of data to generate novel hypothesis to address the most crucial biomedical questions and aid in the discovery of more effective and improved therapeutic options for the treatment of complex and pervasive diseases such as cancer. Cancer research has progressed rapidly in the last decade with the implementation of high-dimensional genomic technologies. The large amount of data generated over the years has enabled a systems-based approach to uncovering and elucidating the complex signaling networks associated with cancer. However, even though new technologies have advanced our understanding of cancer biology beyond what could be imagined even a decade ago, there still exist unique challenges associated precisely with the amount of data that is now routinely generated from even a single patient. The data must be stored and processed, with novel analysis strategies called for to uncover new insights into cancer biology that are literally hidden in 'big data'. Interest in taming 'big data' through methods and systems to extract, represent, and transform it into knowledge that can effectively be used for reasoning and question answering will only increase over time, enabling scientists to finally use the data for personalized treatment, discovery and validation. Work presented in this session includes novel approaches to explore cancer gene expression data, applying algebraic topology (Lockwood and Krishnamoorthy) and Denoising autoencoders (Tan et al) to identify significant properties of genomic data that cannot be found by traditional algorithms. There is also a novel methodology for leveraging somatic mutation data for predicting survival in cancer samples (Kim et al), a computational system for automated gene expression pattern annotation on mouse brain images that could prove to be key to understanding the pathogenesis of brain tumors and their early detection (Yang et al). With respect to knowledge extraction, this session includes work on a weakly supervised machine learning approach for automatic pathway extraction from PubMed abstracts (Poon et al), and on the use protein interaction data from multiple sources to investigate mutations in 125 genes that were earlier identified as driving tumorigenesis when mutated (Engin et al).

Original languageEnglish (US)
Title of host publication20th Pacific Symposium on Biocomputing, PSB 2015
PublisherStanford University
Number of pages4
StatePublished - 2015
Event20th Pacific Symposium on Biocomputing, PSB 2015 - Big Island, United States
Duration: Jan 4 2015Jan 8 2015


Other20th Pacific Symposium on Biocomputing, PSB 2015
Country/TerritoryUnited States
CityBig Island

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering


Dive into the research topics of 'Cancer pathways: Automatic extraction, representation, and reasoning in the 'big data' era'. Together they form a unique fingerprint.

Cite this