TY - JOUR
T1 - Being Properly Improper
AU - Sypherd, Tyler
AU - Nock, Richard
AU - Sankar, Lalitha
N1 - Funding Information:
We thank the anonymous reviewers for their comments and suggestions. This work is supported in part by NSF grants CIF-1901243, CIF-1815361, CIF-2007688, CIF-2134256, SaTC-2031799, and a Google AI for Social Good grant.
Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022
Y1 - 2022
N2 - Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to “untwist” the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called α-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper α-loss under a novel boosting algorithm, called PILBOOST, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper α-loss outperforms the proper log-loss on several variants of twisted data.
AB - Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to “untwist” the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called α-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper α-loss under a novel boosting algorithm, called PILBOOST, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper α-loss outperforms the proper log-loss on several variants of twisted data.
UR - http://www.scopus.com/inward/record.url?scp=85148990798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85148990798&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85148990798
SN - 2640-3498
VL - 162
SP - 20891
EP - 20932
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 39th International Conference on Machine Learning, ICML 2022
Y2 - 17 July 2022 through 23 July 2022
ER -