Comprehensive Prediction of Molecular Recognition in a Combinatorial Chemical Space Using Machine Learning

Alexander T. Taguchi, James Boyd, Chris Diehnelt, Joseph B. Legutki, Zhan-Gong Zhao, Neal W. Woodbury

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


In combinatorial chemical approaches, optimizing the composition and arrangement of building blocks toward a particular function has been done using a number of methods, including high throughput molecular screening, molecular evolution, and computational prescreening. Here, a different approach is considered that uses sparse measurements of library molecules as the input to a machine learning algorithm which generates a comprehensive, quantitative relationship between covalent molecular structure and function that can then be used to predict the function of any molecule in the possible combinatorial space. To test the feasibility of the approach, a defined combinatorial chemical space consisting of â1012 possible linear combinations of 16 different amino acids was used. The binding of a very sparse, but nearly random, sampling of this amino acid sequence space to 9 different protein targets is measured and used to generate a general relationship between peptide sequence and binding for each target. Surprisingly, measuring as little as a few hundred to a few thousand of the â1012 possible molecules provides sufficient training to be highly predictive of the binding of the remaining molecules in the combinatorial space. Furthermore, measuring only amino acid sequences that bind weakly to a target allows the accurate prediction of which sequences will bind 10-100 times more strongly. Thus, the molecular recognition information contained in a tiny fraction of molecules in this combinatorial space is sufficient to characterize any set of molecules randomly selected from the entire space, a fact that potentially has significant implications for the design of new chemical function using combinatorial chemical libraries.

Original languageEnglish (US)
Pages (from-to)500-508
Number of pages9
JournalACS Combinatorial Science
Issue number10
StatePublished - Oct 12 2020


  • affinity
  • ligand
  • machine learning
  • molecular recognition
  • neural network
  • peptide array
  • prediction
  • protein target

ASJC Scopus subject areas

  • Chemistry(all)


Dive into the research topics of 'Comprehensive Prediction of Molecular Recognition in a Combinatorial Chemical Space Using Machine Learning'. Together they form a unique fingerprint.

Cite this