Logistic regression augmented community detection for network data with application in identifying autism-related gene pathways

Yunpeng Zhao, Qing Pan, Chengan Du

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


When searching for gene pathways leading to specific disease outcomes, additional information on gene characteristics is often available that may facilitate to differentiate genes related to the disease from irrelevant background when connections involving both types of genes are observed and their relationships to the disease are unknown. We propose method to single out irrelevant background genes with the help of auxiliary information through a logistic regression, and cluster relevant genes into cohesive groups using the adjacency matrix. Expectation–maximization algorithm is modified to maximize a joint pseudo-likelihood assuming latent indicators for relevance to the disease and latent group memberships as well as Poisson or multinomial distributed link numbers within and between groups. A robust version allowing arbitrary linkage patterns within the background is further derived. Asymptotic consistency of label assignments under the stochastic blockmodel is proven. Superior performance and robustness in finite samples are observed in simulation studies. The proposed robust method identifies previously missed gene sets underlying autism related neurological diseases using diverse data sources including de novo mutations, gene expressions, and protein–protein interactions.

Original languageEnglish (US)
Pages (from-to)222-234
Number of pages13
Issue number1
StatePublished - Mar 2019


  • Autism spectral disease
  • Covariates augmented community detection
  • Expectation–maximization algorithm
  • Gene clustering
  • Logistic regression
  • Pseudo-likelihood

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics


Dive into the research topics of 'Logistic regression augmented community detection for network data with application in identifying autism-related gene pathways'. Together they form a unique fingerprint.

Cite this