TY - JOUR
T1 - Root cause analysis of soft-error-induced failures from hardware and software perspectives
AU - Jung, Jinhyo
AU - Ko, Yohan
AU - So, Hwisoo
AU - Lee, Kyoungwoo
AU - Shrivastava, Aviral
N1 - Funding Information:
This work was partially supported by funding from National Science Foundation Grants No. CNS 1525855 , CPS 1646235 , CCF 1723476 - the NSF/Intel joint research center for Computer Assisted Programming for Heterogeneous Architectures (CAPA) , Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2021-0-00155 , Context and Activity Analysis-based Solution for Safe Childcare), National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2022-00165225 ), and Samsung Electronics Co., Ltd ( FOUNDRY-202108DD007F ). We would like to thank Editage ( www.editage.co.kr ) for English language editing.
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/9
Y1 - 2022/9
N2 - Because the dangers of soft errors are increasing with continued technology scaling, reliability against soft errors is becoming an important design concern for modern embedded systems. Various schemes have been proposed to protect embedded systems from the threat of soft errors, but they incur considerable overheads in terms of cost and performance. Selective protection techniques seem promising because they can achieve high levels of protection with low overhead. Though these techniques can be applied to any system, the most vulnerable parts must first be identified. We, therefore, present CFA, a comprehensive failure analysis framework that can analyze the vulnerability of microarchitectural components and software instructions through intensive fault injection campaigns. With CFA, we also explore the vulnerability of ten benchmarks from the MiBench benchmark suite. We found that protecting a part of the system heavily affects the reliability of the other parts. Therefore, all combinations of protection methods must be examined to present the most efficient and effective protection guidelines. Throughout the experiments, we observed that protection methods offered by single-perspective analyses are sub-optimal. On the other hand, CFA finds the optimal solution in every case, reducing the AVF of a system by up to 82% with minimal protection.
AB - Because the dangers of soft errors are increasing with continued technology scaling, reliability against soft errors is becoming an important design concern for modern embedded systems. Various schemes have been proposed to protect embedded systems from the threat of soft errors, but they incur considerable overheads in terms of cost and performance. Selective protection techniques seem promising because they can achieve high levels of protection with low overhead. Though these techniques can be applied to any system, the most vulnerable parts must first be identified. We, therefore, present CFA, a comprehensive failure analysis framework that can analyze the vulnerability of microarchitectural components and software instructions through intensive fault injection campaigns. With CFA, we also explore the vulnerability of ten benchmarks from the MiBench benchmark suite. We found that protecting a part of the system heavily affects the reliability of the other parts. Therefore, all combinations of protection methods must be examined to present the most efficient and effective protection guidelines. Throughout the experiments, we observed that protection methods offered by single-perspective analyses are sub-optimal. On the other hand, CFA finds the optimal solution in every case, reducing the AVF of a system by up to 82% with minimal protection.
KW - Failure analysis
KW - Fault injection
KW - Reliability
KW - Soft error
KW - Transient fault
UR - http://www.scopus.com/inward/record.url?scp=85134769400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134769400&partnerID=8YFLogxK
U2 - 10.1016/j.sysarc.2022.102652
DO - 10.1016/j.sysarc.2022.102652
M3 - Article
AN - SCOPUS:85134769400
SN - 1383-7621
VL - 130
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 102652
ER -