The multi-fidelity multi-armed bandit

Kirthevasan Kandasamy; Gautam Dasarathy; Jeff Schneider; Barnabás Póczos

The multi-fidelity multi-armed bandit

Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, Barnabás Póczos

Research output: Contribution to journal › Conference article › peer-review

Abstract

We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ^(M). The m^th fidelity (an approximation) expends λ^(m) < λ^(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.

Original language	English (US)
Pages (from-to)	1785-1793
Number of pages	9
Journal	Advances in Neural Information Processing Systems
State	Published - Jan 1 2016
Externally published	Yes
Event	30th Annual Conference on Neural Information Processing Systems, NIPS 2016 - Barcelona, Spain Duration: Dec 5 2016 → Dec 10 2016

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

@article{3f3c761a16e242a4a287c2e3df2f0e70,

title = "The multi-fidelity multi-armed bandit",

abstract = "We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.",

author = "Kirthevasan Kandasamy and Gautam Dasarathy and Jeff Schneider and Barnab{\'a}s P{\'o}czos",

year = "2016",

month = jan,

day = "1",

language = "English (US)",

pages = "1785--1793",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

note = "30th Annual Conference on Neural Information Processing Systems, NIPS 2016 ; Conference date: 05-12-2016 Through 10-12-2016",

}

TY - JOUR

T1 - The multi-fidelity multi-armed bandit

AU - Kandasamy, Kirthevasan

AU - Dasarathy, Gautam

AU - Schneider, Jeff

AU - Póczos, Barnabás

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.

AB - We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.

UR - http://www.scopus.com/inward/record.url?scp=85018901939&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018901939&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85018901939

SN - 1049-5258

SP - 1785

EP - 1793

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 30th Annual Conference on Neural Information Processing Systems, NIPS 2016

Y2 - 5 December 2016 through 10 December 2016

ER -

The multi-fidelity multi-armed bandit

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this