TY - GEN
T1 - A mechanism for online diagnosis of hard faults in microprocessors
AU - Bower, Fred A.
AU - Sorin, Daniel J.
AU - Ozev, Sule
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults, In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers, and we present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.
AB - We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults, In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers, and we present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.
UR - http://www.scopus.com/inward/record.url?scp=33749413197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749413197&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2005.8
DO - 10.1109/MICRO.2005.8
M3 - Conference contribution
AN - SCOPUS:33749413197
SN - 0769524400
SN - 9780769524405
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 197
EP - 208
BT - MICRO-38
T2 - MICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture
Y2 - 12 November 2005 through 16 November 2005
ER -