TY - JOUR
T1 - A General Framework for Decentralized Optimization with First-Order Methods
AU - Xin, Ran
AU - Pu, Shi
AU - Nedic, Angelia
AU - Khan, Usman A.
N1 - Funding Information:
Manuscript received January 11, 2020; revised May 17, 2020; accepted August 17, 2020. Date of current version October 27, 2020. The work of Ran Xin and Usman A. Khan was supported in part by NSF under Grant CMMI-1903972 and Grant CBET-1935555. The work of Shi Pu was supported in part by the Shenzhen Research Institute of Big Data (SRIBD) under Grant J00120190011. The work of Angelia Nedić was supported by NSF under Grant CCF-1717391. (Corresponding author: Usman A. Khan.) Ran Xin is with the Department of Electrical and Computer Engineering (ECE), Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: ranx@andrew.cmu.edu). Shi Pu is with the School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen 518172, China (e-mail: pushi@cuhk.edu.cn). Angelia Nedić is with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: angelia.nedich@asu.edu). Usman A. Khan is with the Department of Electrical and Computer Engineering (ECE), Tufts University, Medford, MA 02155 USA (e-mail: khan@ece.tufts.edu).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Decentralized optimization to minimize a finite sum of functions, distributed over a network of nodes, has been a significant area within control and signal-processing research due to its natural relevance to optimal control and signal estimation problems. More recently, the emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area. In this article, we discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems, where such methods, due to their simplicity, serve as the first method of choice for many complex inference and training tasks. In particular, we provide a general framework of decentralized first-order methods that is applicable to directed and undirected communication networks alike and show that much of the existing work on optimization and consensus can be related explicitly to this framework. We further extend the discussion to decentralized stochastic first-order methods that rely on stochastic gradients at each node and describe how local variance reduction schemes, previously shown to have promise in the centralized settings, are able to improve the performance of decentralized methods when combined with what is known as gradient tracking. We motivate and demonstrate the effectiveness of the corresponding methods in the context of machine learning and signal-processing problems that arise in decentralized environments.
AB - Decentralized optimization to minimize a finite sum of functions, distributed over a network of nodes, has been a significant area within control and signal-processing research due to its natural relevance to optimal control and signal estimation problems. More recently, the emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area. In this article, we discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems, where such methods, due to their simplicity, serve as the first method of choice for many complex inference and training tasks. In particular, we provide a general framework of decentralized first-order methods that is applicable to directed and undirected communication networks alike and show that much of the existing work on optimization and consensus can be related explicitly to this framework. We further extend the discussion to decentralized stochastic first-order methods that rely on stochastic gradients at each node and describe how local variance reduction schemes, previously shown to have promise in the centralized settings, are able to improve the performance of decentralized methods when combined with what is known as gradient tracking. We motivate and demonstrate the effectiveness of the corresponding methods in the context of machine learning and signal-processing problems that arise in decentralized environments.
KW - Consensus
KW - decentralized optimization
KW - gradient descent (GD)
KW - machine learning
KW - stochastic methods
UR - http://www.scopus.com/inward/record.url?scp=85095718721&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095718721&partnerID=8YFLogxK
U2 - 10.1109/JPROC.2020.3024266
DO - 10.1109/JPROC.2020.3024266
M3 - Article
AN - SCOPUS:85095718721
SN - 0018-9219
VL - 108
SP - 1869
EP - 1889
JO - Proceedings of the IEEE
JF - Proceedings of the IEEE
IS - 11
M1 - 9241497
ER -