Regularization via Structural Label Smoothing

Weizhi Li; Gautam Dasarathy; Visar Berisha

Regularization via Structural Label Smoothing

Weizhi Li, Gautam Dasarathy, Visar Berisha

Research output: Contribution to journal › Conference article › peer-review

Abstract

Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on several synthetic and benchmark datasets (including the CIFAR-100). The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing.

Original language	English (US)
Pages (from-to)	1453-1463
Number of pages	11
Journal	Proceedings of Machine Learning Research
Volume	108
State	Published - 2020
Event	23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online Duration: Aug 26 2020 → Aug 28 2020

ASJC Scopus subject areas

Artificial Intelligence
Software
Control and Systems Engineering
Statistics and Probability

Cite this

@article{08a80182b271426e90e79f6ef01822bb,

title = "Regularization via Structural Label Smoothing",

abstract = "Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on several synthetic and benchmark datasets (including the CIFAR-100). The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing.",

author = "Weizhi Li and Gautam Dasarathy and Visar Berisha",

note = "Funding Information: This research was supported in part by the Office of Naval Research grant N000141410722. Publisher Copyright: Copyright {\textcopyright} 2020 by the author(s); 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 ; Conference date: 26-08-2020 Through 28-08-2020",

year = "2020",

language = "English (US)",

volume = "108",

pages = "1453--1463",

journal = "Proceedings of Machine Learning Research",

issn = "2640-3498",

}

TY - JOUR

T1 - Regularization via Structural Label Smoothing

AU - Li, Weizhi

AU - Dasarathy, Gautam

AU - Berisha, Visar

PY - 2020

Y1 - 2020

N2 - Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on several synthetic and benchmark datasets (including the CIFAR-100). The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing.

AB - Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on several synthetic and benchmark datasets (including the CIFAR-100). The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing.

UR - http://www.scopus.com/inward/record.url?scp=85161822253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85161822253&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85161822253

SN - 2640-3498

VL - 108

SP - 1453

EP - 1463

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020

Y2 - 26 August 2020 through 28 August 2020

ER -

Regularization via Structural Label Smoothing

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this