TY - GEN
T1 - Improving batch normalization with skewness reduction for deep neural networks
AU - Kevin Ding, Pak Lun
AU - Martin, Sarah
AU - Li, Baoxin
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Batch Normalization (BN) is a well-known technique used in training deep neural networks. The main idea behind batch normalization is to normalize the features of the layers (i.e., transforming them to have a mean equal to zero and a variance equal to one). Such a procedure encourages the optimization landscape of the loss function to be smoother, and improves the learning of the networks for both speed and performance. In this paper, we demonstrate that the performance of the network can be improved, if the distributions of the features of the output in the same layer are similar. As normalizing based on mean and variance does not necessarily make the features to have the same distribution, we propose a new normalization scheme: Batch Normalization with Skewness Reduction (BNSR). Comparing with other normalization approaches, BNSR transforms not just only the mean and variance, but also the skewness of the data. By tackling this property of a distribution, we are able to make the output distributions of the layers to be further similar. The nonlinearity of BNSR may further improve the expressiveness of the underlying network. Comparisons with other normalization schemes are tested on the CIFAR-100 and ImageNet datasets. Experimental results show that the proposed approach can outperform other state-of-the-arts that are not equipped with BNSR.
AB - Batch Normalization (BN) is a well-known technique used in training deep neural networks. The main idea behind batch normalization is to normalize the features of the layers (i.e., transforming them to have a mean equal to zero and a variance equal to one). Such a procedure encourages the optimization landscape of the loss function to be smoother, and improves the learning of the networks for both speed and performance. In this paper, we demonstrate that the performance of the network can be improved, if the distributions of the features of the output in the same layer are similar. As normalizing based on mean and variance does not necessarily make the features to have the same distribution, we propose a new normalization scheme: Batch Normalization with Skewness Reduction (BNSR). Comparing with other normalization approaches, BNSR transforms not just only the mean and variance, but also the skewness of the data. By tackling this property of a distribution, we are able to make the output distributions of the layers to be further similar. The nonlinearity of BNSR may further improve the expressiveness of the underlying network. Comparisons with other normalization schemes are tested on the CIFAR-100 and ImageNet datasets. Experimental results show that the proposed approach can outperform other state-of-the-arts that are not equipped with BNSR.
UR - http://www.scopus.com/inward/record.url?scp=85110515891&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110515891&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9412949
DO - 10.1109/ICPR48806.2021.9412949
M3 - Conference contribution
AN - SCOPUS:85110515891
T3 - Proceedings - International Conference on Pattern Recognition
SP - 7165
EP - 7172
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -