Low overhead checkpointing and rollback recovery scheme for distributed systems

Zhijun Tong, Richard Y. Kain, W. T. Tsai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

A major obstacle in implementing a rollback recovery scheme for fault tolerance in a concurrent distributed system is the domino effect. A low overhead checkpointing scheme is proposed to prevent this effect. Each process saves its state periodically. The state-save synchronization among processes is implemented by bounding clock drifts. A communication protocol that assures that all saved states are consistent is developed.

Original languageEnglish (US)
Title of host publicationProceedings - Symposium on Reliability in Distributed Software and Database Systems
Editors Anon
Place of PublicationPiscataway, NJ, United States
PublisherPubl by IEEE
Pages12-20
Number of pages9
StatePublished - 1989
Externally publishedYes
EventProceedings of the Eighth Symposium on Reliable Distributed Systems - Seattle, WA, USA
Duration: Oct 10 1989Oct 12 1989

Other

OtherProceedings of the Eighth Symposium on Reliable Distributed Systems
CitySeattle, WA, USA
Period10/10/8910/12/89

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Low overhead checkpointing and rollback recovery scheme for distributed systems'. Together they form a unique fingerprint.

Cite this