Oct 21, 2017

Search

Relax: An Architectural Framework for Software Recovery of Hardware Faults

| Sorted by Date | Classified by Publication Type | Classified by Research Category |

Marc de Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. Relax: An Architectural Framework for Software Recovery of Hardware Faults. In Proceedings of the 37th International Symposium on Computer Architecture, 2010.

Download

[PDF] [Slides]

Abstract

As technology scales ever further, device unreliability is creatingexcessive complexity for hardware to maintain the illusion of perfectoperation. In this paper, we consider whether exposing hardware faultinformation to software and allowing software to control faultrecovery simplifies hardware design and helps technology scaling.The combination of emerging applications and emerging many-corearchitectures makes software recovery a viable alternative tohardware-based fault recovery. Emerging applications tend to havefew I/O and memory side-effects, which limits the amount ofinformation that needs checkpointing, and they allow discardingindividual sub-computations with small qualitative impact. Softwarerecovery can harness these properties in ways that hardware recoverycannot.We describe Relax, an architectural framework for software recovery ofhardware faults. Relax includes three core components:(1) an ISA extension that allows software to mark regions of code for softwarerecovery,(2) a hardware organization that simplifies reliability considerations and provdes) an ISA extension that allows software to mark regions of code for softwareenergy efficiency with hardware recovery support removed, and(3) software support for compilers and programmers to utilize the Relax ISA.Applying Relax to counter the effects of process variation, our results showa 20% energy efficiency improvement for PARSEC applications with only minimal source code changesand simpler hardware.

BibTeX

 @inproceedings{isca10:relax,
   author={Marc de Kruijf and Shuou Nomura and Karthikeyan Sankaralingam},
   title={Relax: An Architectural Framework for Software Recovery of Hardware Faults},
   booktitle="{Proceedings of the 37th International Symposium on Computer Architecture}",
   year={2010},
   abstract = {
 As technology scales ever further, device unreliability is creating
 excessive complexity for hardware to maintain the illusion of perfect
 operation.  In this paper, we consider whether exposing hardware fault
 information to software and allowing software to control fault
 recovery simplifies hardware design and helps technology scaling.
 The combination of emerging applications and emerging many-core
 architectures makes software recovery a viable alternative to
 hardware-based fault recovery. Emerging applications tend to have
 few I/O and memory side-effects, which limits the amount of
 information that needs checkpointing, and they allow discarding
 individual sub-computations with small qualitative impact.  Software
 recovery can harness these properties in ways that hardware recovery
 cannot.
 We describe Relax, an architectural framework for software recovery of
 hardware faults. Relax includes three core components:
 (1) an ISA extension that allows software to mark regions of code for software
 recovery,
 (2) a hardware organization that simplifies reliability considerations and provdes) an ISA extension that allows software to mark regions of code for software
 energy efficiency with hardware recovery support removed, and
 (3) software support for compilers and programmers to utilize the Relax ISA.
 Applying Relax to counter the effects of process variation, our results show
 a 20\% energy efficiency improvement
 for PARSEC applications with only minimal source code changes
 and simpler hardware.
 },
   bib_dl_pdf = {http://www.cs.wisc.edu/vertical/papers/2010/isca10-relax.pdf},
   bib_dl_ppt = {http://www.cs.wisc.edu/vertical/talks/2010/isca10-relax.pptx},
   bib_pubtype = {Refereed Conference},
   bib_rescat = {Architecture}
 }

Generated by bib.pl (written by Patrick Riley ) on Sat Jul 15, 2017 14:53:22 time=1207019082

Page Actions