TY - JOUR
T1 - Digital medicine and the curse of dimensionality
AU - Berisha, Visar
AU - Krantsevich, Chelsea
AU - Hahn, P. Richard
AU - Hahn, Shira
AU - Dasarathy, Gautam
AU - Turaga, Pavan
AU - Liss, Julie
N1 - Funding Information:
This work was supported by the National Institutes of Health under R01 grants 5R01DC006859-13 and 1R01GM140468-01; the Office of Naval Research under grant N00014-17-1-2826; and the National Science Foundation under grant CCF-2048223.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Digital health data are multimodal and high-dimensional. A patient’s health state can be characterized by a multitude of signals including medical imaging, clinical variables, genome sequencing, conversations between clinicians and patients, and continuous signals from wearables, among others. This high volume, personalized data stream aggregated over patients’ lives has spurred interest in developing new artificial intelligence (AI) models for higher-precision diagnosis, prognosis, and tracking. While the promise of these algorithms is undeniable, their dissemination and adoption have been slow, owing partially to unpredictable AI model performance once deployed in the real world. We posit that one of the rate-limiting factors in developing algorithms that generalize to real-world scenarios is the very attribute that makes the data exciting—their high-dimensional nature. This paper considers how the large number of features in vast digital health data can challenge the development of robust AI models—a phenomenon known as “the curse of dimensionality” in statistical learning theory. We provide an overview of the curse of dimensionality in the context of digital health, demonstrate how it can negatively impact out-of-sample performance, and highlight important considerations for researchers and algorithm designers.
AB - Digital health data are multimodal and high-dimensional. A patient’s health state can be characterized by a multitude of signals including medical imaging, clinical variables, genome sequencing, conversations between clinicians and patients, and continuous signals from wearables, among others. This high volume, personalized data stream aggregated over patients’ lives has spurred interest in developing new artificial intelligence (AI) models for higher-precision diagnosis, prognosis, and tracking. While the promise of these algorithms is undeniable, their dissemination and adoption have been slow, owing partially to unpredictable AI model performance once deployed in the real world. We posit that one of the rate-limiting factors in developing algorithms that generalize to real-world scenarios is the very attribute that makes the data exciting—their high-dimensional nature. This paper considers how the large number of features in vast digital health data can challenge the development of robust AI models—a phenomenon known as “the curse of dimensionality” in statistical learning theory. We provide an overview of the curse of dimensionality in the context of digital health, demonstrate how it can negatively impact out-of-sample performance, and highlight important considerations for researchers and algorithm designers.
UR - http://www.scopus.com/inward/record.url?scp=85118356942&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118356942&partnerID=8YFLogxK
U2 - 10.1038/s41746-021-00521-5
DO - 10.1038/s41746-021-00521-5
M3 - Review article
AN - SCOPUS:85118356942
SN - 2398-6352
VL - 4
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 153
ER -