CSC 801 (001) | Spring 2023 (by Man-Ki Yoon)

Systems Seminar

CSC 801 (001), Spring 2023

Schedule

Date/Time Location Speaker Note
Friday, Jan 13, 1:00 PM EB2 3001 Xiaohui (Helen) Gu and Xipeng Shen See below
Friday, Jan 20, 10:00 AM EB2 3211 Wesley Guez Assunção (Johannes Kepler University Linz) Detail
Friday, Jan 27, 10:00 AM EB2 3211 Hamid Bagheri (University of Nebraska - Lincoln) Detail
Thursday, Feb 2, 10:00 AM EB2 3211 Liang He (University of Colorado - Denver) Detail
Thursday, Feb 9, 10:00 AM EB2 3211 Nivedita Arora (Georgia Institute of Technology) Detail
Friday, Feb 17, 1:00 PM EB2 3001 Yuanchao Xu HPCA ‘23
Monday, Feb 27, 10:00 AM EB2 3211 Chenhan Xu (State University of New York at Buffalo) Detail
Friday, Mar 3, 10:00 AM EB2 3211 Mo Sha (Florida International University) Detail
Friday, Mar 10, 1:00 PM EB2 3001 Jakub Szefer (Yale) See below
Friday, Mar 17 (you pick) Spring Break (No meeting)
Friday, Mar 24 (No meeting) (No meeting)  
Friday, Mar 31, 1:00 PM EB2 3001 Siva Hari (Nvidia) See below
Monday, Apr 3, 10:30 AM EB2 3001 Milind Chabbi (Uber) See below
Friday, Apr 7, 1:00 PM EB2 3001 Michael LeMay (Intel Labs) See below
Friday, Apr 14, 1:00 PM EB2 3001 Ennan Zhai (Alibaba Cloud) See below
Friday, Apr 21, 1:30 PM EB2 3211 Xu Liu See below
Wednesday, Apr 24, 2:30 PM EB2 3211 Juan Gómez Luna See below

More Details

Jan 13, 1:00 PM at EB2 3001

Title: Real World Production Incident Prediction and Root Cause Analysis using Unsupervised Machine Learning
Speaker: Xiaohui (Helen) Gu
Abstract: In this talk, I will present InsightFinder Unified Intelligence Engine (UIE) which provides an AI-driven predictive observability platform for pinpointing incident root causes, predicting and preventing production incidents. Powered by patented self-tuning unsupervised machine learning, InsightFinder continuously learns from raw logs, metrics, and application performance traces to localize root causes and predict incidents from the source. Companies of all sizes have embraced the platform and seen business-impacting incidents can be predicted hours ahead with clearly pinpointed root causes.


Title: How Coarsening Speeds up Differentiable Programming by 100X
Speaker: Xipeng Shen
Abstract: This talk presents a novel optimization for differentiable programming named coarsening optimization, created by Xipeng Shen whe he collaborated with a team in Meta during his Sabbatical. This technique offers a systematic way to synergize symbolic differentiation and algorithmic differentiation (AD). Through it, the granularity of the computations differentiated by each step in AD can become much larger than a single operation, and hence lead to much reduced runtime computations and data allocations in AD. To circumvent the difficulties that control flow creates to symbolic differentiation in coarsening, this work introduces $\phi$-calculus, a novel method to allow symbolic reasoning and differentiation of computations that involve branches and loops. It further avoids “expression swell” in symbolic differentiation and balance reuse and coarsening through the design of reuse-centric segment of interest identification. Experiments on a collection of real-world applications show that coarsening optimization is effective in speeding up AD, producing several times to two orders of magnitude speedups.

Jan 20, 10:00 AM at EB2 3211

Title: “Microservicification” of Legacy Systems: industrial needs, automated support, and developers
Speaker: Wesley Guez Assunção (Johannes Kepler University Linz)
Abstract: See here for more details.

Jan 27, 10:00 AM at EB2 3211

Title: Practical Formal Analysis of Software-Intensive Systems
Speaker: Hamid Bagheri (University of Nebraska - Lincoln)
Abstract: See here for more details.

Feb 2, 10:00 AM at EB2 3211

Title: Batteries Beyond Batteries
Speaker: Liang He (University of Colorado - Denver)
Abstract: See here for more details.

Feb 9, 10:00 AM at EB2 3211

Title: Towards Sustainable Computational Material and Things
Speaker: Nivedita Arora (Georgia Institute of Technology)
Abstract: See here for more details.

Feb 17, 1:00 PM at EB2 3001

Title: Reconciling Selective Logging and Hardware Persistent Memory Transaction (HPCA 2023)
Speaker: Yuanchao Xu

Feb 27, 10:00 AM at EB2 3211

Title: Towards Precision Sensing in IoT for Human Interaction, Healthcare, and Beyond
Speaker: Chenhan Xu (State University of New York at Buffalo)
Abstract: See here for more details.

Mar 3, 10:00 AM at EB2 3211

Title: Dependable Industrial Cyber-Physical Systems
Speaker: Mo Sha (Florida International University)
Abstract: See here for more details.

Mar 10, 1:00 PM at EB2 3001

Title: Quantum Computer Architectures and Hardware Security
Speaker: Jakub Szefer (Yale University)
Abstract: As Quantum Computer device research continues to advance rapidly, there are also advances at the other levels of the computer system stack that involve these devices. In particular, more and more of the Quantum Computer devices are becoming available as cloud-based services through IBM Quantum, Amazon Braket, Microsoft Azure, and others. And in parallel, researchers have put forward ideas about multi-programming of the Quantum Computer devices where single device can be shared by multiple programs, or even multiple users. While all of the advances make the Quantum Computer devices more easily accessible and increase utilization, they open up the devices to various security threats. The objective of this seminar will be to introduce audience to recent research on security of quantum computer architectures and hardware, demonstrating recent security attack prototypes, as well as defenses. The focus of the seminar will be on superconducting qubit quantum computers, however, the security ideas can be applied to other types of quantum computers. One of the goals of the seminar will be as well to motivate discussion about quantum computer architecture and hardware security and make connections between the quantum computer research community and the architecture and hardware security research community to help develop secure quantum computers of the future.

Mar 31, 1:00 PM at EB2 3001

Title: Ensuring Safety and Resilience of GPU-based AI Systems
Speaker: Siva Hari (NVIDIA)
Abstract: The use of GPUs has been pivotal in driving the progress of AI. AI has been enabling new autonomous capabilities in vehicles, robots, and drones, significantly expanding their practical applications. However, with the growing adoption of GPUs in safety-critical systems, it is important to analyze safety and resilience of these applications to unforeseen events. This presentation will provide an overview of transient hardware errors, which are one of the sources of errors, and the assessment tools and methodologies developed to analyze their effects on system safety. It will also cover error mitigation techniques being developed to improve safety. In recent years, significant research has been conducted in this domain, and the talk will highlight some of these developments, with a focus on the resilience and safety of DNN and AV systems. The presenter will also discuss outstanding problems that can benefit from further research.

Apr 3, 10:30 AM at EB2 3001

Title: Demystifying Golang Concurrency Bugs
Speaker: Milind Chabbi (Uber)
Abstract: Golang is the language of choice at Uber for developing microservices and infrastructure tools. Golang brings concurrent programming to the masses. The presence of shared memory alongside message passing makes Go a unique programming language. Writing correct and efficient parallel programs is hard; Golang does not guarantee the correctness or efficiency of parallel programs. In this talk, I will discuss the landscape of concurrency bugs in Golang and how they impact Uber production software systems. Using data races as an example of shared-memory concurrency bugs and deadlocks as an example of message-passing bugs, I will discuss the tools we have designed and deployed over the past several months to detect and mitigate these bugs at Uber. The insights gained from close inspection of concurrency bugs in Golang reveal the complex interplay between language design and concurrency bugs.

Apr 7, 1:00 PM at EB2 3001

Title: Deeply Hardening Software with Cryptography
Speaker: Michael LeMay (Intel Labs)
Abstract: This talk will describe the evolving software exploit landscape, some of the mitigations that have been or are being developed, and active areas of research. This will include an overview of Cryptographic Capability Computing (C3), which was introduced in a MICRO 2021 paper and is now the focus for a project led by Intel Labs as part of the DARPA HARDEN program.

Apr 14, 1:00 PM at EB2 3001

Title: Reliability and Agility in Alibaba Global Network
Speaker: Ennan Zhai (Alibaba Cloud)
Abstract: As one of the largest cloud service providers, Alibaba Cloud serves over one billion customers around the world. To ensure quality service at this scale, the underlying network infrastructure is of critical importance. This talk focuses on two of the essential aspects in our network management and operation: reliability and agility. I will present recent examples of our efforts in these areas, including (1) our configuration verification system, Hoyan Jinjing (published in SIGCOMM’20 and SIGCOMM’19), built to ensure the reliability of our global-scale network, and (2) our new cutting-edge programmable data-plane compiler, Lyra (published in SIGCOMM’20), designed to support flexible data plane programming on heterogeneous programmable ASICs. In addition, I will also touch on our learned lessons and open questions in these areas.

Apr 21, 1:30 PM at EB2 3211

Title: Performance Analysis and Optimization from Tools’ Perspective
Speaker: Xu Liu
Abstract: Performance optimization is a long-standing research topic in computer systems. In this talk, I will introduce my continuous work on developing program analysis tools for performance measurement (also known as profilers). I will describe the development of various tools developed in my group to analyze program performance in different dimensions and preview some future research directions.

Apr 24, 2:30 PM at EB2 3211

Title: Understanding a Modern Processing-in-Memory Architecture: Benchmarking and Experimental Analysis
Speaker: Juan Gómez Luna (ETH)
Abstract: Processing-in-memory (PIM) is becoming a reality which promises to overcome the data movement bottleneck (i.e., the waste of execution cycles and energy due to frequent movement of data between memory and compute units) by equipping compute systems with compute-capable memories. Several major vendors and startups have prototyped and announced their PIM architectures. Among them, the UPMEM company commercializes the first publicly-available real-world PIM architecture. This architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. In this talk, we will provide an overview of the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. We evaluate the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and compare their performance and energy consumption to their state-of-the-art CPU and GPU counterparts. Our extensive evaluation conducted on two real UPMEM-based PIM systems with 640 and 2,556 DPUs provides new insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems.