Date October 27, 2016
Speaker Mattan Erez
University of Texas, Austin
Title Toward Exascale Resilience: Hardware Mechanisms and Containment Domains
Abstract In this talk I will present a scalable and efficient resiliency scheme based on the concept of Containment Domains. Containment domains are programming and system constructs that encapsulate and express application resiliency needs and interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics and are nested to take advantage of the machine hierarchy and to enable distributed and hierarchical state preservation, restoration, and recovery as an alternative to non-scalable and inefficient checkpoint-restart (and variants). One of the key motivations behind this work is the idea of proportionality, where the resources devoted to a feature are proportional to the application and scenario needs. Proportionality is critical to continued scaling and performance under the increasing constraints of bandwidth, power, and energy. Essentially, one-size-fits-all and worst-case design approaches are no longer sufficient to building reliable and efficient systems. I will also briefly discuss some of the hardware mechanisms necessary for reliability and resilience and the tradeoffs they offer for proportionality.
Bio Mattan Erez is an Associate Professor and holder of the Temple Foundation Professor Fellowship (#4) at the Department of Electrical and Computer Engineering at the University of Texas at Austin. His research focuses on improving the performance, efficiency, and scalability of computing systems through advances in hardware architecture, software systems, and programming models. The vision is to increase the cooperation across system layers and develop flexible and adaptive mechanisms for proportional resource usage. Mattan received a B.Sc. in Electrical Engineering and a B.A. in Physics from the Technion, Israel Institute of Technology and his M.S and Ph.D. in Electrical Engineering from Stanford University. He is a recipient of a Presidential Early Career Award for Scientists and Engineers, an Early Career Research Award from the Department of Energy, and an NSF CAREER Award.
Resources