Today, high-performance clusters of shared-memory multiprocessors (SMPs) are employed to cope with large data sets for scientific applications. On these SMPs, hybrid programming models combing message passing and shared memory are often less efficient than pure message passing although the former fits SMP architectures more closely.
The objective of this work is to determine the sources of inefficiencies in utilizing memory hierarchies of SMPs and to optimize memory behavior. The novelty lies in the reliance on dynamic binary rewriting, i.e., performance analysis and tuning are performed on the application while it executes.
The technical challenges are to
The key intellectual merit is in providing additional, dynamic optimizations for long-running applications. The broader impact of this work lies in its contribution to counter the increasing gap between processor and main memory speeds by fully exploiting software optimizations.
"Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."