Course Information
Room: EB2 3211
Fridays 11a-12p
Convolutional Neural Networks (CNN) are widely used for Deep Learning tasks. CNN pruning is an important method to adapt a large CNN model trained on general datasets to fit a more specialized task or a smaller device. The key challenge is in deciding which filters to remove in order to maximize the quality of the pruned networks while satisfying the constraints. It is time-consuming due to the enormous configuration space and the slowness of CNN training. The problem has drawn many efforts from the machine learning field, which try to reduce the set of network configurations to explore. This work tackles the problem distinctively from a programming systems perspective, trying to speed up the evaluations of the remaining configurations through computation reuse via a compiler-based framework. We empirically uncover the existence of composability in the training of a collection of pruned CNN models and point out the opportunities for computation reuse. We then propose composability-based CNN pruning and design a compression-based algorithm to efficiently identify the set of CNN layers to pre-train for maximizing their reuse benefits in CNN pruning. We further develop a compiler-based framework named Wootz, which, for an arbitrary CNN, automatically generates code that builds a Teacher-Student scheme to materialize composability-based pruning. Experiments show that network pruning enabled by Wootz shortens the state-of-art pruning process by up to 186X while producing significantly improved pruning results.
This is on Thursday 19 September. Same room: EB2 3211
Onkar Patil
Presenter: Tao Wang
Abstract
Ad hoc synchronizations are pervasive in multi-threaded programs. Due to their diversity and complexity, understanding the enforced synchronization relationships of ad hoc synchronizations is challenging but crucial to multi-threaded program development and maintenance. Existing techniques can partially detect primitive ad hoc synchronizations, but they cannot recognize complete implementations or infer the enforced synchronization relationships. In this paper, we propose a framework to automatically identify complex ad hoc synchronizations in full and infer their synchronization relationships for barriers. We instantiate the framework with a tool called BARRIERFINDER, which features
various techniques, including program slicing and bounded symbolic execution, to efficiently explore interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BARRIERFINDER then uses these traces to recognize ad hoc barriers. Our evaluation shows that BARRIERFINDER is both effective and efficient in recognizing ad hoc barriers automatically.
This is on Tuesday (it replaces the usual seminar on Friday 18 Oct).
Xu Liu
Tuesday October 22, 2019 09:30 AM, EB2 3211
Abstract: Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as wasteful memory operations, such as redundant or useless memory loads and stores. Aliasing, limited optimization scopes, and insensitivity to input and execution contexts act as severe deterrents to static program analysis. Microscopic observation of whole executions at instruction-and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. In this talk, I will describe various wasteful memory operations, which pervasively exist in modern software packages and expose great potential for optimization. I will discuss the design of a fine-grained instrumentation-based profiling framework that identifies wasteful operations in their contexts, which guides nontrivial performance improvement. Furthermore, I will show our recent improvement to the profiling framework by abandoning instrumentation, which reduces the runtime overhead from 10x to 3% on average. I will show how our approach works for native binaries and various managed languages such as Java, yielding new performance insights for optimization.
Short Bio: Xu Liu is an assistant professor in the Department of Computer Science at College of William & Mary. He obtained his Ph.D. from Rice University in 2014 and joined the College of William & Mary in the same year. Prof. Liu works on building performance tools to pinpoint and optimize inefficiencies in HPC code bases. He has developed several open-source profiling tools, which are world-widely used at universities, DOE national laboratories, and industrial companies. Prof. Liu has published a number of papers in high-quality venues. His papers received Best Paper Award at SC’15, PPoPP’18, PPoPP’19, and ASPLOS’17 Highlights, as well as Distinguished Paper Award at ICSE’19. His recent ASPLOS’18 paper has been selected as ACM SIGPLAN Research Highlights in 2019 and nominated for CACM Research Highlights. Prof. Liu is the receipt of 2019 IEEE TCHPC Early Career Researchers Award for Excellence in High Performance Computing. Prof. Liu served on the program committee of conferences such as SC, PPoPP, IPDPS, CGO, HPCA, and ASPLOS.
Host: Frank Mueller, CSC
Speaker: Na Meng, Virginia Tech
http://people.cs.vt.edu/nm8247/
In this talk, I will present our recent research that intends to bridge the gap between program complexity and developers’ programming capabilities. There are two parts in my talk. For the first part, I will introduce our empirical studies on developers’ secure coding practices. By crawling and analyzing developers’ technical discussions on the StackOverflow website, we identified various programming challenges that developers are faced when they build security functionalities. We also showed security vulnerabilities due to developers’ API misuses. Furthermore, we examined the reliability of security suggestions on StackOverflow, and revealed a worrisome reality in the software development industry. For the second part, I will present our recent tool that recommends code refactorings for developers. All our empirical studies and techniques have the potential to help developers (1) better understand program complexity and the complexity of software maintenance, and (2) improve program maintenance as well as software quality.
This is on Monday at 4pm
NC native and Duke alumnus Fred Brooks is Kenan Professor, Emeritus in the Department of Computer Science at UNC-Chapel Hill, which he founded in 1964 and chaired for twenty years. Prior to coming to UNC, Dr. Brooks worked for nine years with IBM. He was an architect of the IBM Stretch supercomputer and the Harvest cryptanalytic engine. He then served as Corporate Project Manager for the IBM System/360 mainframes, including the development of the System/360 computer family hardware and then the Operating System/360 software. His most important technical decision was to change IBM’s byte size from 6 to 8 bits, enabling lower-case characters.
At UNC, Dr. Brooks has conducted research in computer architecture, software engineering, and interactive 3-D computer graphics (“virtual reality”). His best-known books are The Mythical Man-Month: Essays on Software Engineering (1975, 1995), Computer Architecture: Concepts and Evolution (with G.A. Blaauw, 1997), and The Design of Design (2010). Dr. Brooks has received the U.S. National Medal of Technology and the A.M. Turing Award of the ACM.
Fred has cultivated a active Christian presence in the UNC community. Since 1965, he has advised Focus, the graduate chapter of InterVarsity Christian Fellowship at UNC. He chairs the Board of the NC Study Center (“Battle House”).
This is on Thursday at 9:30
Pacific Northwest National Laboratory
Thursday November 07, 2019 09:30 AM
Location: 3211, EB2 NCSU Centennial Campus
Abstract: In this talk I will present novel high performance algorithmic techniques and data structures to build a scalable sparse tensor library and a benchmark suite on multicore CPUs and graphics co-processors (GPUs). A tensor could be regarded as a multiway array, generalizing matrices to more than two dimensions. When used to represent multifactor data, tensor methods can help analysts discover latent structure; this capability has found numerous applications in data modeling and mining in such domains as healthcare analytics, social networks analytics, computer vision, signal processing, and neuroscience, to name a few. Besides, sparse tensor algebra has been found useful in more applications, such as Quantum Chemistry and Deep Learning. This talk will cover my recently proposed performance-efficient and space-saving sparse tensor format (named “HiCOO”), based-on which a sparse tensor library (named “HiParTi”) and a sparse tensor benchmark suite (named “PASTA”) are built. The future directions of tensors and their influence on applications and computer architectures will be illustrated along with recent trends.
Short Bio: Jiajia Li is a research scientist in High Performance Computing group at Pacific Northwest National Laboratory (PNNL). She has received her Ph.D. degree from Georgia Institute of Technology in 2018. Her current research emphasizes on optimizing tensor methods especially for sparse data from diverse applications by utilizing various parallel architectures. She is an awardee of Best Student Paper Award at SC’18, Best Paper Finalist at PPoPP’19, and “A Rising Star in Computational and Data Sciences”. She has served on the technical program committee of conferences, such as PPoPP, SC, ICS, IPDPS, ICPP, HiPC, Euro-Par. In the past, she had received a Ph.D. degree from Institute of Computing Technology at Chinese Academy of Sciences, China and a B.S. degree in Computational Mathematics from Dalian University of Technology, China. Please check her website for more information: http://jiajiali.org .
This is on Thursday at 9:30
Seminar Time: 9:30 AM (talk begins)
Seminar Place: Room 3211, EB2, NCSU Centennial Campus
Lili Su
FL is a new distributed learning paradigm proposed by Google. The goal of FL is to enable the cloud (i.e., the learner) to train a model without collecting the training data from users' mobile devices. Compared with traditional learning, FL suffers serious security issues and several practical constraints call for new security Strategies. Towards quantitative and systematic insights into the impacts of those security issues, we formulated and studied the problem of Byzantine-resilient Federated Learning. We proposed two robust learning rules that secure gradient descent against Byzantine faults. The estimation error achieved under our more recently proposed rule is order-optimal in the minimax sense.
https://www.csc.ncsu.edu/research/colloquia/seminar-post.php?id=923
Abstract: Nowadays, Docker containers are widely adopted in the industry for deploying applications in many Information Technology (IT) contexts. However, the short lifespan of containers running dynamic workloads makes detecting security exploits a difficult task. In this paper, we present a method of training exploit detection models using data aggregated over multiple containers. Our results using an autoencoder-based model show advantages in using aggregated container data rather than single container data in terms of detection and false positive rates. In addition, our experiments show that the system can gather data from similar containerized applications and detect exploits in real time, which is applicable for real world scenarios. Title: Vulnerability Exploit Detection Over Aggregated Container Data