ScalaJack: Scalable Trace-Based Tools for In-Situ Data Analysis of HPC Applications

funded by: NSF (award abstract)
funding level: $457,395
duration: 06/01/2012 - 05/31/2015 (no-cost extension until 05/31/2017)
PI: Frank Mueller

Production codes on supercomputers are struggling to remain scalable each time the processor core count increases by a factor of 10, even though they run efficiently at smaller scale. But root cause diagnosis fails at petascale since (1) symptoms of performance problems can be subtle, (2) only few metrics can be efficiently collected and (3) tools can only feasibly record a small subset of even these metrics.

This work addresses these problems by creating a framework that allows application developers to focus on data analysis that drives customized data extraction combined with on-the-fly analysis specifically geared to their individual problems. This is accomplished by combining trace analysis and in-situ data analysis techniques at runtime, thereby lifting data reduction to a new level where it IS analysis. With this approach, modular measurement and analysis components are combined to selectively extract representative data from production codes in a problem-specific manner, which enables root cause analysis.

The work demonstrates the feasibility of customized data extraction and analysis at scale for root cause analysis on current and forthcoming multi-petascale supercomputers. It thus contributes to sustain scalable scientific computing into the future up to the largest scales. Results of this work will be contributed as open-source code to the research community and beyond as done, allowing other groups to not only build tools on top of our framework but also contribute their own components.

Publications:

"FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation" by Tao Wang, Nikhil Jain, David Beckingsale, David Boehme, Frank Mueller, Todd Gamblin in International Conference on Parallel Processing (ICPP), Aug 2019.
"HiDP: A Hierarchical Data Parallel Language" by Y. Zhang and F. Mueller in International Symposium on Code Generation and Optimization (CGO), Feb 2013.
Elastic and Scalable Tracing and Accurate Replay of Non-Deterministic Events by X. Wu, F. Mueller in International Conference on Supercomputing (ICS), Jun 2013.
ScalaJack: Customized Scalable Tracing with in-situ Data Analysis by S. Ananthakrishnan, Frank Mueller in Euro-Par Conference, Aug 2014.
Scalable Tracing of MPI Programs through Signature-Based Clustering Algorithms by A. Bahmani, F. Mueller in International Conference on Supercomputing (ICS), Jun 2014.
ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications by A. Bahmani, F. Mueller in IEEE Big Data, Oct 2015.
"HPC I/O Trace Extrapolation" by Xiaoqing Luo, Frank Mueller, Philip Carns, John Jenkins, Robert Latham, Robert Ross, Shane Snyder , Workshop on Extreme-Scale Programming Tools (ESPT15), Nov 2015.
"SparkScore: Leveraging Apache Spark for Distributed Genomic Inference" by Amir Bahmani, Alex B. Sibley, Mahmoud Parsian, Kouros Owzar, Frank Mueller, Workshop on High Performance Computational Biology (HiCOMB16), May 2016.
Performance Analysis of a Multi-Tenant In-memory Data Grid by Anwesha Das, Frank Mueller, Xiaohui Gu, Arun Iyengar in IEEE Cloud, Jun/Jul 2016.
"Efficient Clustering for Ultra-Scale Application Tracing" by A. Bahmani, F. Mueller in Journal of Parallel and Distributed Computing (JPDC), V ??, No ?, Aug 2016, pages ???, DOI 10.1016/j.jpdc.2016.08.001, accepted.
Power Tuning HPC Jobs on Power-Constrained Systems by Neha Gholkar, Frank Mueller, Barry Rountree in International Conference on Parallel Architecture and Compilation Techniques (PACT), Sep 2016.
Benchmark Generation and Simulation at Extreme Scale by Mahesh Lagadapati, Frank Mueller, Christian Engelmann in International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Sep 2016, pages 9-18.
ScalaIOExtrap: Elastic I/O Tracing and Extrapolation Xiaoqing Luo, Frank Mueller, Philip Carns, Jonathan Jenkins, Robert Latham, Robert Ross and Shane Snyder (IPDPS), May 2017.

Theses:

"Exploiting Data-Parallelism in GPUs" by Y. Zhang, Ph.D. Thesis, North Carolina State University, Sep 2012 (last known position: Stone Ridge Technologies, MD)
"Scalable Communication Tracing for Performance Analysis of Parallel Applications" by X. Wu, Ph.D. Thesis, North Carolina State University, Dec 2012 (last known position: Amazon, WA)
"Customized Scalable Tracing with in-situ Data Analysis" by Srinash Krishna Ananthakrishnan, M.S. Thesis, North Carolina State University, May 2013 (last known position: Riverbed Technologies, CA)
"ScalaIOExtrap: Elastic I/O Tracing and Extrapolation" by Xiaoqing Luo, M.S. Thesis, North Carolina State University, Jun 2015 (last known position: TBD)
"Scalable Communication Tracing via Clustering" by A. Bahmani, Ph.D. Thesis, North Carolina State University, May 2017 (last known position: research staff, Stanford Univ., CA)

Other:

"Scalable Performance Analysis of ExaScale MPI Programs through Signature-Based Clustering Algorithms" by Amir Bahmani, Frank Mueller, refereed poster at Supercomputing, Nov 2013.

"This material is based upon work supported by the National Science Foundation under Grant No. 1217748."

"Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."