An ideal solution for soft error tolerance should hide the effect of soft errors from user and provide correct results at expected time. Software solutions are attractive because they can provide flexible reliability without imposing any hardware modifications. Our investigation of state-of-The-Art error recovery techniques reveals that they suffer from poor coverage (ability to detect and correctly recover from soft errors). This paper presents InCheck (In-Application Checkpointing and Recovery) as an effective, safe and timely software technique for complete error coverage. The key features of InCheck are: verified register preservation, single memory location checkpoints, and safe & timely recovery. To evaluate the effectiveness of InCheck, we performed more than 210,000 fault injection experiments on different hardware components of an ARM cortex53-like processor running MiBench applications. The original and SWIFTR (state-of-The-Art) protected programs suffered from 8000 and 1800 instances of wrong outputs respectively, but when protected by InCheck, there was no failure.
|Title of host publication
|Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017
|Institute of Electrical and Electronics Engineers Inc.
|Published - Jun 18 2017
|54th Annual Design Automation Conference, DAC 2017 - Austin, United States
Duration: Jun 18 2017 → Jun 22 2017
|54th Annual Design Automation Conference, DAC 2017
|6/18/17 → 6/22/17
ASJC Scopus subject areas
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Modeling and Simulation