CSC 801 – Spring 2025 Systems Seminar

Isabella Hu and Brian Qiu

Details:
- Jan 31, 2025, 12:00 p.m. via Google Meet
- 3211 EB2
Title: Tools for Taming Planet-Scale Computing: Maximizing Data Center Efficiency for the AI Era
Speakers: Isabella Hu and Brian Qiu
Abstract: The increasing computational demands of AI are forcing a paradigm shift in how we design and operate data centers. This talk introduces the critical tools needed to navigate the complexities of planet-scale computing. We’ll explore how traces, metrics, and profiles provide a comprehensive view of distributed system performance. This talk will cover how these instrumentation tools empower us to identify bottlenecks, optimize resource allocation, and ultimately achieve significant gains in data center efficiency, ensuring that our infrastructure thrives in the face of the AI revolution.
Speaker bios: Isabella Hu is a Software Engineer at Google and earned a Master’s degree from Carnegie Mellon University. Her work focuses on optimizing computing performance and AI-driven decision making within this domain. Brian Qiu is a Software Engineer at Google with 3 years of experience in system performance and distributed systems. His work focuses on scaling infrastructure to handle fleetwide tracing data.

Dr. Cong Guo

Details:
- Mar 7, 2025, 12:00 p.m. IN PERSON ONLY
- 3211 EB2
Title: Unlocking New Opportunities in Quantization and Sparsity Co-Design for Large Language Models
Speaker: Dr. Cong Guo
Abstract: The rapid expansion of Large Language Models (LLMs) has significantly advanced natural language processing. However, the increasing size of these models has led to inference costs that outpace the development of acceleration hardware. To address this challenge, various techniques have been proposed, including structured sparsity patterns to enhance execution efficiency and adaptive numerical data types to balance precision and performance. Additionally, methods focusing on managing outliers in model data have been developed to maintain accuracy during quantization. Building upon these foundations, I will introduce our latest work, Transitive Array, a novel framework that unifies quantization and sparsity. Transitive Array minimizes redundant computations and optimizes memory usage, offering a hardware-friendly solution for efficient LLM inference. This advancement presents a new opportunity to co-design quantization and sparsity in LLMs, effectively bridging the gap between escalating model complexities and current hardware limitations.
Speaker bio: Cong Guo is a Postdoctoral Associate at Duke University, collaborating with Professors Hai Li and Yiran Chen. He earned his Ph.D. in Computer Science from Shanghai Jiao Tong University and was honored with the 2023 Shanghai Jiao Tong University Outstanding Doctoral Dissertation Award. Cong Guo’s research interests lie in computer architecture and high-performance computing, with a focus on software-hardware co-optimization to accelerate efficient artificial intelligence applications. His work includes designing novel architectures and systems for neural networks, particularly in the areas of sparsity and quantization. Over the past five years, he has published more than 10 papers in leading conferences such as ISCA, MICRO, HPCA, and ASPLOS. His work received an Honorable Mention in the 2022 IEEE Micro Top Picks.

Abdullah Al Arafat

Details:
- April 11, 2025, 12:00 p.m. IN PERSON ONLY
- 3211 EB2
Title: Soteria: A Formal Digital-Twin-Enabled Framework for Safety-Assurance of Latency-Aware Cyber-Physical Systems
Speaker: Abdullah Al Arafat
Abstract: Verifying the safety of latency-aware cyber-physical systems is both critical and challenging due to the interaction between continuous physical dynamics and discrete computational constraints. This paper introduces SOTERIA, a formal framework that integrates digital twins for ensuring safety in these systems. SOTERIA models both the physical dynamics and computational behavior, enabling integrated verification within a specific operating environment. This approach goes beyond conventional methods that either treat physical and computational aspects separately or rely on overly conservative worst-case analyses. By modeling hybrid dynamics alongside computational models and operating environments, SOTERIA verifies both functional and timing correctness. Leveraging established verification tools, SOTERIA determines whether end-to-end latencies meet formal specifications, bridging the gap between computational and physical requirements. We first introduce a simple example of a 1D adaptive cruise control system to illustrate its effectiveness. We then present findings from a case study using the F1Tenth racing car platform and the UPPAAL tool to demonstrate SOTERIA’s effectiveness in realistic scenarios, enabling safety verification that was previously infeasible with conventional schedulability analyses. This work underscores the importance of an integrated verification approach for enhancing safety and reliability in autonomous systems.
Speaker bio: Abdullah is a CS PhD candidate at NC State, advised by Dr. Zhishan Guo. His research interests include real-time systems and cyber-physical systems.

Swastik Mittal

Details:
- April 18, 2025, 12:00 p.m. IN PERSON ONLY
- 3211 EB2
Title: T-Tex: Timed Threaded Execution for Real-time Security andSafety
Speaker: Swastik Mittal
Abstract: Task scheduling and resource management are increasingly subject to attacks exposing system vulnerabilities, particularly on multi- core processors with an attack surface crossing cores and tasks with different privileges. Meanwhile, modern real-time systems utilize multi-core environments, where delay attacks can force deadline misses.
This work proposes “Timed Threaded Execution” (T-Tex), a method to detect such security attacks based on monitoring time dilation induced by unexplained delays in general, and more specif- ically for OpenMP. T-Tex extends OpenMP by exposing it to timed monitoring of code execution. It contributes novel compilation tech- niques for timed instrumentation exemplified for LLVM via multi- phase profiling using OpenMP tracing (OMPT) capabilities. T-Tex also contributes Linux kernel modifications to monitor thread-level execution time across context switches between threads. Experiments on a real platform demonstrate that T-Tex can detect 100% of delay-based intrusions constrained by timer granularity to an unprecedented 60us vulnerability threshold at a performance overhead of 11% − 72% for Parsec and Daphne benchmarks.
Speaker bio: 5th Year PhD student working with real-time systems. Presenting T-Tex which has been accepted at ICCPS 25.