Members

Research

Sponsors

Publications

Tools

Contact Us

HOME

Extent and Effects of Error Propagation and Recovery Mechanisms in
Cache Memory Systems

Sponsor: NSF

Research |Extent and Effects of Error Propagation and Recovery Mechanisms in Cache Memory Systems

Commercial microprocessor systems are being operated at higher and higher clock rates. Faster clocks impact the time that is available to fetch data from cache memory, and enhances the probability of transient error occurrence in cache memory systems. Besides data reading error, an error may also occur in the processor subsystem, and it may write erroneous data into the cache memory. Error-correcting codes allow detection and recovery from some of these errors within the code word limits. Fast error detection allows damage containment and reduces the recovery time, which otherwise could be very expensive in time.

The goals of the proposed research are: (1) to study the extent of error propagation due to transient faults in computer system when a fault originates either in a processor register or a cache location; and (2) to develop techniques and hardware support needed for early detection and recovery from such errors in computation tasks with low overhead and low performance loss. Our techniques will cover a broad spectrum from only detection to full error recovery.


Our research identifes architectural features which, when provided in commercial microprocessors, will make them suitable for use in fault-tolerant applications. The intent is to keep the performance impact in the normal operation to a minimal.


Top