TY - GEN
T1 - Technical challenges of supporting interactive HPC
AU - Reuther, Albert
AU - Kepner, Jeremy
AU - MCcabe, Andy
AU - Mullen, Julie
AU - Bliss, Nadya T.
AU - Kim, Hahn
PY - 2007/12/1
Y1 - 2007/12/1
N2 - Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with ondemand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access.
AB - Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with ondemand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access.
KW - Cluster computing
KW - Grid computing
KW - Interactive high performance computing
KW - On-demand
KW - Parallel MATLAB
UR - http://www.scopus.com/inward/record.url?scp=49949110340&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49949110340&partnerID=8YFLogxK
U2 - 10.1109/HPCMP-UGC.2007.72
DO - 10.1109/HPCMP-UGC.2007.72
M3 - Conference contribution
AN - SCOPUS:49949110340
SN - 0769530885
SN - 9780769530888
T3 - Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC
SP - 403
EP - 409
BT - Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program
T2 - Department of Defense - HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC
Y2 - 18 June 2007 through 21 June 2007
ER -