Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery

Rohit Nandakumar, Valentin Dinu

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.

Original languageEnglish (US)
Article numbere10381
StatePublished - Dec 7 2020


  • Drug discovery
  • Machine learning
  • Protein-protein interaction

ASJC Scopus subject areas

  • General Neuroscience
  • General Biochemistry, Genetics and Molecular Biology
  • General Agricultural and Biological Sciences


Dive into the research topics of 'Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery'. Together they form a unique fingerprint.

Cite this