TY - JOUR
T1 - Energy-efficient acceleration of MapReduce applications using FPGAs
AU - Neshatpour, Katayoun
AU - Malik, Maria
AU - Sasan, Avesta
AU - Rafatirad, Setareh
AU - Mohsenin, Tinoush
AU - Ghasemzadeh, Hassan
AU - Homayoun, Houman
N1 - Funding Information:
Tinoush Mohsenin is an Assistant Professor in the Department of Computer Science and Electrical Engineering at University of Maryland Baltimore County. She received her Ph.D. from University of California, Davis in 2010 and M.S. degree from Rice University in 2004, both in Electrical and Computer Engineering. Prof. Mohsenin’s research focus is on designing highly accurate and energy efficient embedded processors for machine learning, signal processing and knowledge extraction techniques for autonomous systems, wearable smart health monitoring, and embedded big data computing. She has over 80 peer-reviewed journal and conference publications and is the recipient of NSF CAREER award in 2017, the best paper award in the GLSVLSI conference in 2016, and the best paper honorable award in ISCAS 2017 for developing domain-specific accelerators for biomedical, deep learning and cognitive computing. She currently leads 8 research projects in her lab which are all funded by National Science Foundation (NSF), Army Research Lab (ARL), Northrop Grumman, Boeing, Nvidia and Xilinx. She has served as associate editor in IEEE Transactions on Circuits and Systems-I (TCAS-I) and IEEE Transactions on Biomedical Circuits and Systems (TBioCAS). She was the local arrangement co-chair for the 50th IEEE International Symposium on Circuits and Systems (ISCAS) in Baltimore. She has also served as technical program committee member of the IEEE International Solid-State Circuits Conference Student Research (ISSCC-SRP), IEEE Biomedical Circuits and Systems (BioCAS), IEEE International Symposium on Circuits and Systems (ISCAS), ACM Great Lakes Symposium on VLSI (GLSVLSI and IEEE International Symposium on Quality Electronic Design (ISQED) conferences. She also serves as secretary of IEEE P1890 on Error Correction Coding for Non-Volatile Memories.
Funding Information:
Katayoun Neshatpour is a Ph.D. student at the department of Electrical and Computer Engineering at George Mason University. She is a recipient of the three-year Presidential Fellowship and a 1-year supplemental ECE department scholarship. She has received her M.Sc. degree in Electrical Engineering from Sharif University of Technology and B.Sc. degree from Isfahan University of Technology. Her research interests are acceleration of Big data applications with a focus on MapReduce platform, energy-efficient implementation of machine-learning application including Convolutional Neural Networks and low-power VLSI design.
Funding Information:
Setareh Rafatirad is an Assistant Professor of the IST department at George Mason University. Prior to joining George Mason, she spent four years as a Research Assistant at UC Irvine. Prior to that, she worked as a software developer on the development of numerous industrial application systems and tools. As a known expert in the field of Data Analytics and Application Design, she has published on a variety of topics related to Big Data, and served on the panel of scientific boards. Setareh received her Ph.D. degree from the Department of Information and Computer Science at the UC Irvine in 2012. She was the recipient of 3-year UC Irvine CS department chair fellowship. She received her M.S. degree from the Department of Information and Computer Science at the UC Irvine in 2010.
Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2018/9
Y1 - 2018/9
N2 - In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3× and 15× is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3×. We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.
AB - In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3× and 15× is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3×. We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.
KW - FPGA
KW - Hadoop
KW - Hardware+software co-design
KW - Machine learning
KW - MapReduce
KW - Zynq boards
UR - http://www.scopus.com/inward/record.url?scp=85045264459&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045264459&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2018.02.004
DO - 10.1016/j.jpdc.2018.02.004
M3 - Article
AN - SCOPUS:85045264459
SN - 0743-7315
VL - 119
SP - 1
EP - 17
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -