A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization

Tyler Sypherd, Mario Diaz, John Kevin Cava, Gautam Dasarathy, Peter Kairouz, Lalitha Sankar

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


We introduce a tunable loss function called $\alpha $ -loss, parameterized by $\alpha \in (0,\infty]$ , which interpolates between the exponential loss ( $\alpha = 1/2$ ), the log-loss ( $\alpha = 1$ ), and the 0-1 loss ( $\alpha = \infty $ ), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $\alpha $ -loss and Arimoto conditional entropy, verify the classification-calibration of $\alpha $ -loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of $\alpha $ -loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning $\alpha $ -loss away from log-loss ( $\alpha = 1$ ), and to this end we provide simple heuristics for the practitioner. In particular, navigating the $\alpha $ hyperparameter can readily provide superior model robustness to label flips ( $\alpha > 1$ ) and sensitivity to imbalanced classes ( $\alpha < 1$ ).

Original languageEnglish (US)
Pages (from-to)6021-6051
Number of pages31
JournalIEEE Transactions on Information Theory
Issue number9
StatePublished - Sep 1 2022


  • Arimoto conditional entropy
  • I±-loss
  • classification-calibration
  • generalization
  • robustness
  • strictly local quasi-convexity

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences
  • Computer Science Applications


Dive into the research topics of 'A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization'. Together they form a unique fingerprint.

Cite this