Designing Resilient Hardware by Treating Software Anomalies
Hardware reliability is becoming an increasing concern in the late CMOS era. Components in shipped chips will fail for many reasons. The system must automatically detect, diagnose, recover from, and repair/reconfigure around these failed components and continue to provide reliable operation. The pervasiveness of the problem across a broad market demands low-cost and general reliability solutions, precluding traditional solutions involving excessive redundancy or piecemeal solutions addressing individual failure modes.
This talk will present the SWAT (SoftWare Anomaly Treatment) project. We observe that the hardware reliability solution need handle only the device faults that become visible to software and cause anomalous software behavior. SWAT therefore detects a variety of hardware faults by watching for anomalous software behavior, using novel zero to low-cost hardware and software monitors. In the infrequent case that a fault is detected, SWAT invokes a comprehensive diagnosis procedure to isolate the root cause of the fault, repair or reconfigure around it, and invoke recovery. Effectively, SWAT treats hardware faults uniformly as software bugs, leveraging and amortizing overhead across techniques used for software reliability. Our long-term goal is to develop a hardware-software codesigned solution that treats both hardware and software faults with a common framework optimized for overall system reliability.
Sarita Adve is Professor of Computer Science and Director of Research at the Intel/Microsoft funded Universal Parallel Computing Research Center at Illinois. Her research interests are in computer architecture and systems, parallel computing, and power and reliability-aware systems. Most recently, she co-developed the memory models for the C++ and Java programming languages based on her early work on data-race-free models, and co-invented the concept of lifetime reliability aware processors and dynamic reliability management. She received the ACM SIGARCH Maurice Wilkes award in 2008, was named a University Scholar in 2004, and received an Alfred P. Sloan Research Fellowship in 1998. She received the Ph.D. in Computer Science from Wisconsin in 1993.