TY - JOUR
T1 - ExpertRNA
T2 - A New Framework for RNA Secondary Structure Prediction
AU - Liu, Menghan
AU - Poppleton, Erik
AU - Pedrielli, Giulia
AU - Šulc, Petr
AU - Bertsekas, Dimitri P.
N1 - Funding Information:
The research presented in this paper was partially supported by The National Science Foundation [Grant 2007861] to principal investigator Dr. Pedrielli.
Funding Information:
History: Accepted by Paul Brooks, Area Editor for Applications in Biology, Medicine, & Healthcare. Funding: The research presented in this paper was partially supported by the National Science Founda-tion [Grant 2007861] to principal investigator Dr. Pedrielli.
Publisher Copyright:
© 2022 INFORMS.
PY - 2022/9
Y1 - 2022/9
N2 - Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”—given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms largely rely on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs are also investigated and recently been shown to outperform free energy–based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework that can easily incorporate an arbitrary number of rewards (free energy or nonparametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions of nonpseudoknotted secondary structures than the structure prediction algorithm used, thus validating the promise of the approach.
AB - Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”—given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms largely rely on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs are also investigated and recently been shown to outperform free energy–based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework that can easily incorporate an arbitrary number of rewards (free energy or nonparametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions of nonpseudoknotted secondary structures than the structure prediction algorithm used, thus validating the promise of the approach.
KW - applications
KW - biology
KW - computational methods
KW - computational science
KW - deterministic
KW - dynamic programming
KW - industries
KW - pharmaceutical
UR - http://www.scopus.com/inward/record.url?scp=85134402148&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134402148&partnerID=8YFLogxK
U2 - 10.1287/ijoc.2022.1188
DO - 10.1287/ijoc.2022.1188
M3 - Article
AN - SCOPUS:85134402148
SN - 1091-9856
VL - 34
SP - 2464
EP - 2484
JO - INFORMS Journal on Computing
JF - INFORMS Journal on Computing
IS - 5
ER -