TY - GEN
T1 - Stochastic Gaussian Process Model Averaging for High-Dimensional Inputs
AU - Xuereb, Maxime
AU - Hui Ng, Szu
AU - Pedrielli, Giulia
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/14
Y1 - 2020/12/14
N2 - Many statistical learning methodologies exhibit loss of efficiency and accuracy when applied to large, high-dimensional data-sets. Such loss is exacerbated by noisy data. In this paper, we focus on Gaussian Processes (GPs), a family of non-parametric approaches used in machine learning and Bayesian Optimization. In fact, GPs show difficulty scaling with the input data size and dimensionality. This paper presents, for the first time, the Stochastic GP Model Averaging (SGPMA) algorithm, to tackle both challenges. SGPMA uses a Bayesian approach to weight several predictors, each trained with an independent subset of the initial data-set (solving the large data-sets issue), and defined in a low-dimensional embedding of the original space (solving the high dimensionality). We conduct several experiments with different input size and dimensionality. The results show that our methodology is superior to naive averaging and that the embedding choice is critical to manage the computational cost / prediction accuracy trade-off.
AB - Many statistical learning methodologies exhibit loss of efficiency and accuracy when applied to large, high-dimensional data-sets. Such loss is exacerbated by noisy data. In this paper, we focus on Gaussian Processes (GPs), a family of non-parametric approaches used in machine learning and Bayesian Optimization. In fact, GPs show difficulty scaling with the input data size and dimensionality. This paper presents, for the first time, the Stochastic GP Model Averaging (SGPMA) algorithm, to tackle both challenges. SGPMA uses a Bayesian approach to weight several predictors, each trained with an independent subset of the initial data-set (solving the large data-sets issue), and defined in a low-dimensional embedding of the original space (solving the high dimensionality). We conduct several experiments with different input size and dimensionality. The results show that our methodology is superior to naive averaging and that the embedding choice is critical to manage the computational cost / prediction accuracy trade-off.
UR - http://www.scopus.com/inward/record.url?scp=85103911514&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103911514&partnerID=8YFLogxK
U2 - 10.1109/WSC48552.2020.9384114
DO - 10.1109/WSC48552.2020.9384114
M3 - Conference contribution
AN - SCOPUS:85103911514
T3 - Proceedings - Winter Simulation Conference
SP - 373
EP - 384
BT - Proceedings of the 2020 Winter Simulation Conference, WSC 2020
A2 - Bae, K.-H.
A2 - Feng, B.
A2 - Kim, S.
A2 - Lazarova-Molnar, S.
A2 - Zheng, Z.
A2 - Roeder, T.
A2 - Thiesing, R.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 Winter Simulation Conference, WSC 2020
Y2 - 14 December 2020 through 18 December 2020
ER -