gemV-tool: A Comprehensive Soft Error Reliability Estimation Tool for Design Space Exploration

Hwisoo So, Yohan Ko, Jinhyo Jung, Kyoungwoo Lee, Aviral Shrivastava

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

With aggressive technology scaling, soft errors have become a major threat in modern computing systems. Several techniques have been proposed in the literature and implemented in actual devices as countermeasures to this problem. However, their effectiveness in ensuring error-free computing cannot be ascertained without an accurate reliability estimation methodology. This can be achieved by using the vulnerability metric: the probability of system failure as a function of the time the program data are exposed to transient faults. In this work, we present a gemV-tool, a comprehensive toolset for estimating system vulnerability, based on the cycle-accurate gem5 simulator. The three main characteristics of the gemV-tool are: (i) fine-grained modeling: vulnerability modeling at a fine-grained granularity through the use of RTL abstraction; (ii) accurate modeling: accurate vulnerability calculation of speculatively executed instructions; and (iii) comprehensive modeling: vulnerability estimation of all the sequential elements in the out-of-order processor core. We validated our vulnerability models through extensive fault injection campaigns with <3% correlation error and 90% statistical confidence. Using the gemV-tool, we made the following observations: (i) the vulnerability of two microarchitectural configurations with similar performance can differ by 82%; (ii) the vulnerability of a processor can vary by more than 10×, depending on the implemented algorithm; and (iii) the vulnerability of each component in the processor varies significantly, depending on the ISA of the processor.

Original languageEnglish (US)
Article number4573
JournalElectronics (Switzerland)
Volume12
Issue number22
DOIs
StatePublished - Nov 2023

Keywords

  • embedded systems
  • fault tolerance
  • protection technique
  • soft error
  • transient fault

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'gemV-tool: A Comprehensive Soft Error Reliability Estimation Tool for Design Space Exploration'. Together they form a unique fingerprint.

Cite this