Maximizing and Satisficing in Multi-armed Bandits with Graph Information

Parth K. Thaker; Nikhil Rao; Mohit Malu; Gautam Dasarathy

Maximizing and Satisficing in Multi-armed Bandits with Graph Information

Parth K. Thaker, Nikhil Rao, Mohit Malu, Gautam Dasarathy

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and ζ-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.

Original language	English (US)
Title of host publication	Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Editors	S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
Publisher	Neural information processing systems foundation
ISBN (Electronic)	9781713871088
State	Published - 2022
Event	36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States Duration: Nov 28 2022 → Dec 9 2022

Publication series

Name	Advances in Neural Information Processing Systems
Volume	35
ISSN (Print)	1049-5258

Conference

Conference	36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/Territory	United States
City	New Orleans
Period	11/28/22 → 12/9/22

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

Thaker, P. K., Rao, N., Malu, M., & Dasarathy, G. (2022). Maximizing and Satisficing in Multi-armed Bandits with Graph Information. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 (Advances in Neural Information Processing Systems; Vol. 35). Neural information processing systems foundation.

Maximizing and Satisficing in Multi-armed Bandits with Graph Information. / Thaker, Parth K.; Rao, Nikhil; Malu, Mohit et al.
Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. ed. / S. Koyejo; S. Mohamed; A. Agarwal; D. Belgrave; K. Cho; A. Oh. Neural information processing systems foundation, 2022. (Advances in Neural Information Processing Systems; Vol. 35).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Thaker, PK, Rao, N, Malu, M & Dasarathy, G 2022, Maximizing and Satisficing in Multi-armed Bandits with Graph Information. in S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho & A Oh (eds), Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. Advances in Neural Information Processing Systems, vol. 35, Neural information processing systems foundation, 36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, United States, 11/28/22.

Thaker PK, Rao N, Malu M, Dasarathy G. Maximizing and Satisficing in Multi-armed Bandits with Graph Information. In Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, editors, Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. Neural information processing systems foundation. 2022. (Advances in Neural Information Processing Systems).

Thaker, Parth K. ; Rao, Nikhil ; Malu, Mohit et al. / Maximizing and Satisficing in Multi-armed Bandits with Graph Information. Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. editor / S. Koyejo ; S. Mohamed ; A. Agarwal ; D. Belgrave ; K. Cho ; A. Oh. Neural information processing systems foundation, 2022. (Advances in Neural Information Processing Systems).

@inproceedings{485719e616944111873bc639adbde5f9,

title = "Maximizing and Satisficing in Multi-armed Bandits with Graph Information",

abstract = "Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and ζ-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.",

author = "Thaker, {Parth K.} and Nikhil Rao and Mohit Malu and Gautam Dasarathy",

note = "Funding Information: ∗This work was supported in part by the National Science Foundation through the awards CCF-2048223, CNS-2003111, CCF-2029044, and OAC-1934766. This work was also supported partly by the ASU SenSIP Center Publisher Copyright: {\textcopyright} 2022 Neural information processing systems foundation. All rights reserved.; 36th Conference on Neural Information Processing Systems, NeurIPS 2022 ; Conference date: 28-11-2022 Through 09-12-2022",

year = "2022",

language = "English (US)",

series = "Advances in Neural Information Processing Systems",

publisher = "Neural information processing systems foundation",

editor = "S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh",

booktitle = "Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022",

}

TY - GEN

T1 - Maximizing and Satisficing in Multi-armed Bandits with Graph Information

AU - Thaker, Parth K.

AU - Rao, Nikhil

AU - Malu, Mohit

AU - Dasarathy, Gautam

N1 - Funding Information: ∗This work was supported in part by the National Science Foundation through the awards CCF-2048223, CNS-2003111, CCF-2029044, and OAC-1934766. This work was also supported partly by the ASU SenSIP Center Publisher Copyright: © 2022 Neural information processing systems foundation. All rights reserved.

PY - 2022

Y1 - 2022

N2 - Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and ζ-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.

AB - Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and ζ-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.

UR - http://www.scopus.com/inward/record.url?scp=85163197689&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85163197689&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85163197689

T3 - Advances in Neural Information Processing Systems

BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022

A2 - Koyejo, S.

A2 - Mohamed, S.

A2 - Agarwal, A.

A2 - Belgrave, D.

A2 - Cho, K.

A2 - Oh, A.

PB - Neural information processing systems foundation

T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022

Y2 - 28 November 2022 through 9 December 2022

ER -

Maximizing and Satisficing in Multi-armed Bandits with Graph Information

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this