Qi will present a paper from PLDI 2019. Paper link: https://dl.acm.org/doi/10.1145/3314221.3314621
Abstract: This paper proposes TaskProf2, a parallelism profiler and an adviser for task parallel programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization bottlenecks, scheduling overheads, and secondary effects of execution. As an adviser, TaskProf2 identifies regions that matter in improving parallelism. To accomplish these objectives, it uses a performance model that captures series-parallel relationships between various dynamic execution fragments of tasks and includes fine-grained measurement of computation in those fragments. Using this performance model, TaskProf2’s what-if analyses identify regions that improve the parallelism of the program while considering tasking overheads. Its differential analyses perform fine-grained differencing of an oracle and the observed performance model to identify static regions experiencing secondary effects. We have used TaskProf2 to identify regions with serialization bottlenecks and secondary effects in many applications.
Natural Language (NL) programming automatically synthesizes code based on inputs expressed in natural language. It has recently received lots of growing interest. Recent solutions however all require many labeled training examples for their data-driven nature. This paper proposes an NLU-driven approach, a new approach inspired by how humans learn programming. It centers around Natural Language Understanding and draws on a novel graph-based mapping algorithm, foregoing the need of large numbers of labeled examples. The resulting NL programming framework, HISyn, using no training examples, gives synthesis accuracy comparable to those by data-driven methods trained on hundreds of training numbers. HISyn meanwhile demonstrates advantages in interpretability, error diagnosis support, and cross-domain extensibility.
This is a practice talk for FSE 2020.
CDL: Classified Distributed Learning for Detecting Security Attacks in Containerized Applications
Abstract: Containers
have been widely adopted in production computing environments for its
efficiency and low overhead of isolation. However, recent studies have
shown that containerized applications are prone to various security
attacks. Moreover, containerized applications are often highly dynamic
and short-lived, which further exacerbates the problem. In this paper,
we present CDL, a classified distributed learning framework to achieve
efficient security attack detection for containerized applications. CDL
integrates online application classification and anomaly detection to
overcome the challenge of lacking sufficient training data for dynamic
short-lived containers while considering diversified normal behaviors in
different applications. We have implemented a prototype of CDL and
evaluated it over 33 real world vulnerability attacks in 24 commonly
used server applications. Our experimental results show that CDL can
reduce the false positive rate from over 12% to 0.24% compared to the
traditional anomaly detection scheme without aggregating training data.
Compared to the distributed learning method without application
classification, CDL can improve the detection rate from catching 20 out
of 33 attacks to 31 out of 33 attacks before those attacks compromise
the server systems. CDL is light-weight, which can complete application
classification and anomaly detection within a few milliseconds.
This is a practice talk for ACSAC 2020.
HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems
Abstract: Software
hang bugs are notoriously difficult to debug, which often cause serious
service outages in cloud systems. In this paper, we present HangFix, a
software hang bug fixing framework which can automatically fix a hang
bug that is triggered and detected in production cloud environments.
HangFix first leverages stack trace analysis to localize the hang
function and then performs root cause pattern matching to classify hang
bugs into different types based on likely root causes. Next, HangFix
generates effective code patches based on the identified root cause
patterns. We have implemented a prototype of HangFix and evaluated the
system on 42 real-world software hang bugs in 10 commonly used cloud
server applications. Our results show that HangFix can successfully fix
40 out of 42 hang bugs in seconds.
This is a practice talk for SOCC 2020.
Abstract: The big memory platform is emerging, evidenced by Intel Optane DC persistent memory-based system providing up to 9TB memory per machine and Amazon EC2 high memory instance providing up to 24 TB memory per machine. However, the impact of those big memory platforms on high performance computing (HPC) applications is largely unknown. Is the big memory platform useful for HPC applications? On the one hand, the big memory platform enables scientific simulations with larger problem scales, because of large memory capability; On the other hand, we observe that in production supercomputers, 90% of jobs utilize less than 15% of the node memory capacity, and for 90% of the time, memory utilization is less than 35%. Many computation-intensive HPC applications cannot benefit from the big memory system. In this talk, we discuss challenges and opportunities that the big memory platform brings to HPC applications. We use molecular dynamics (MD) simulation, a computation-intensive application, for study. We introduce a memoization framework (named MD-PM) that trades large memory capacity for high computation capability. Evaluating with nine realistic MD simulation problems on Optane DC PM, we show that MD-PM consistently outperforms a state-of-the-art MD simulation package LAMMPS with an average speedup of 22.96x. The big memory system has great potential to accelerate HPC applications.
Bio: Dong Li is an associate professor at EECS at University of California, Merced. Previously, he was a research scientist at the Oak Ridge National Laboratory (ORNL), studying computer architecture and programming model for next generation supercomputer systems. Dong earned his PhD in computer science from Virginia Tech. His research focuses on high performance computing (HPC), and maintains a strong relevance to computer systems. The core theme of his research is to study how to enable scalable and efficient execution of scientific applications on increasingly complex large-scale parallel systems. Dong received a CAREER Award from U.S. National Science Foundation in 2016, and an ORNL/CSMD Distinguished Contributor Award in 2013. His paper in SC'14 was nominated as the best student paper. He is also the lead PI for NVIDIA CUDA Research Center at UC Merced. He is a review board member of IEEE Transaction on Parallel and Distributed Systems (TPDS).