Isabella Hu and Brian Qiu
- Details:
- Jan 31, 2025, 12:00 p.m. via Google Meet
- 3211 EB2
- Title: Tools for Taming Planet-Scale Computing:
Maximizing Data Center Efficiency for the AI Era
- Speakers: Isabella Hu and Brian Qiu
- Abstract: The increasing computational demands of
AI are forcing a paradigm shift in how we design and operate data
centers. This talk introduces the critical tools needed to navigate the
complexities of planet-scale computing. We’ll explore how traces,
metrics, and profiles provide a comprehensive view of distributed system
performance. This talk will cover how these instrumentation tools
empower us to identify bottlenecks, optimize resource allocation, and
ultimately achieve significant gains in data center efficiency, ensuring
that our infrastructure thrives in the face of the AI revolution.
- Speaker bios: Isabella Hu is a Software Engineer at
Google and earned a Master’s degree from Carnegie Mellon University. Her
work focuses on optimizing computing performance and AI-driven decision
making within this domain. Brian Qiu is a Software Engineer at Google
with 3 years of experience in system performance and distributed
systems. His work focuses on scaling infrastructure to handle fleetwide
tracing data.
Dr. Cong Guo
- Details:
- Mar 7, 2025, 12:00 p.m. IN PERSON ONLY
- 3211 EB2
- Title: Unlocking New Opportunities in Quantization
and Sparsity Co-Design for Large Language Models
- Speaker: Dr. Cong Guo
- Abstract: The rapid expansion of Large Language
Models (LLMs) has significantly advanced natural language processing.
However, the increasing size of these models has led to inference costs
that outpace the development of acceleration hardware. To address this
challenge, various techniques have been proposed, including structured
sparsity patterns to enhance execution efficiency and adaptive numerical
data types to balance precision and performance. Additionally, methods
focusing on managing outliers in model data have been developed to
maintain accuracy during quantization. Building upon these foundations,
I will introduce our latest work, Transitive Array, a
novel framework that unifies quantization and sparsity. Transitive Array
minimizes redundant computations and optimizes memory usage, offering a
hardware-friendly solution for efficient LLM inference. This advancement
presents a new opportunity to co-design quantization and sparsity in
LLMs, effectively bridging the gap between escalating model complexities
and current hardware limitations.
- Speaker bio: Cong Guo is a Postdoctoral Associate
at Duke University, collaborating with Professors Hai Li and Yiran Chen.
He earned his Ph.D. in Computer Science from Shanghai Jiao Tong
University and was honored with the 2023 Shanghai Jiao Tong University
Outstanding Doctoral Dissertation Award. Cong Guo’s research interests
lie in computer architecture and high-performance computing, with a
focus on software-hardware co-optimization to accelerate efficient
artificial intelligence applications. His work includes designing novel
architectures and systems for neural networks, particularly in the areas
of sparsity and quantization. Over the past five years, he has published
more than 10 papers in leading conferences such as ISCA, MICRO, HPCA,
and ASPLOS. His work received an Honorable Mention in the 2022 IEEE
Micro Top Picks.
Abdullah Al Arafat
- Details:
- April 11, 2025, 12:00 p.m. IN PERSON ONLY
- 3211 EB2
- Title: Soteria: A Formal Digital-Twin-Enabled
Framework for Safety-Assurance of Latency-Aware Cyber-Physical
Systems
- Speaker: Abdullah Al Arafat
- Abstract: Verifying the safety of latency-aware
cyber-physical systems is both critical and challenging due to the
interaction between continuous physical dynamics and discrete
computational constraints. This paper introduces SOTERIA, a formal
framework that integrates digital twins for ensuring safety in these
systems. SOTERIA models both the physical dynamics and computational
behavior, enabling integrated verification within a specific operating
environment. This approach goes beyond conventional methods that either
treat physical and computational aspects separately or rely on overly
conservative worst-case analyses. By modeling hybrid dynamics alongside
computational models and operating environments, SOTERIA verifies both
functional and timing correctness. Leveraging established verification
tools, SOTERIA determines whether end-to-end latencies meet formal
specifications, bridging the gap between computational and physical
requirements. We first introduce a simple example of a 1D adaptive
cruise control system to illustrate its effectiveness. We then present
findings from a case study using the F1Tenth racing car platform and the
UPPAAL tool to demonstrate SOTERIA’s effectiveness in realistic
scenarios, enabling safety verification that was previously infeasible
with conventional schedulability analyses. This work underscores the
importance of an integrated verification approach for enhancing safety
and reliability in autonomous systems.
- Speaker bio: Abdullah is a CS PhD candidate at NC
State, advised by Dr. Zhishan Guo. His research interests include
real-time systems and cyber-physical systems.
Swastik Mittal
- Details:
- April 18, 2025, 12:00 p.m. IN PERSON ONLY
- 3211 EB2
- Title: T-Tex: Timed Threaded Execution for
Real-time Security andSafety
- Speaker: Swastik Mittal
- Abstract: Task scheduling and resource management
are increasingly subject to attacks exposing system vulnerabilities,
particularly on multi- core processors with an attack surface crossing
cores and tasks with different privileges. Meanwhile, modern real-time
systems utilize multi-core environments, where delay attacks can force
deadline misses.
This work proposes “Timed Threaded Execution” (T-Tex), a method to
detect such security attacks based on monitoring time dilation induced
by unexplained delays in general, and more specif- ically for OpenMP.
T-Tex extends OpenMP by exposing it to timed monitoring of code
execution. It contributes novel compilation tech- niques for timed
instrumentation exemplified for LLVM via multi- phase profiling using
OpenMP tracing (OMPT) capabilities. T-Tex also contributes Linux kernel
modifications to monitor thread-level execution time across context
switches between threads. Experiments on a real platform demonstrate
that T-Tex can detect 100% of delay-based intrusions constrained by
timer granularity to an unprecedented 60us vulnerability threshold at a
performance overhead of 11% − 72% for Parsec and Daphne benchmarks.
- Speaker bio: 5th Year PhD student working with
real-time systems. Presenting T-Tex which has been accepted at ICCPS
25.