Markov reliability models of fault-tolerant distributed computing systems

M. Liron, B. Melamed, S. S. Yau

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.

Original languageEnglish (US)
Pages (from-to)183-206
Number of pages24
JournalInformation Sciences
Issue number3
StatePublished - Dec 31 1986
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Information Systems and Management
  • Artificial Intelligence
  • Theoretical Computer Science
  • Control and Systems Engineering
  • Computer Science Applications


Dive into the research topics of 'Markov reliability models of fault-tolerant distributed computing systems'. Together they form a unique fingerprint.

Cite this