Asynchronous failure detectors

Failure detectors - oracles that provide information about process crashes - are an important abstraction for crash tolerance in distributed systems. Although current failure-detector theory provides great generality and expressiveness, it also poses significant challenges in developing a robust hie...

Full description

Bibliographic Details
Main Authors: Cornejo Collado, Alex (Contributor), Lynch, Nancy Ann (Contributor), Sastry, Srikanth (Contributor)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Association for Computing Machinery, 2014-09-25T19:28:28Z.
Subjects:
Online Access:Get fulltext
Description
Summary:Failure detectors - oracles that provide information about process crashes - are an important abstraction for crash tolerance in distributed systems. Although current failure-detector theory provides great generality and expressiveness, it also poses significant challenges in developing a robust hierarchy of failure detectors. We address some of these challenges by proposing a variant of failure detectors called asynchronous failure detectors and an associated modeling framework. Unlike the traditional failure-detector framework, our framework eschews real time completely. We show that asynchronous failure detectors are sufficiently expressive to include several popular failure detectors. Additionally, we show that asynchronous failure detectors satisfy many desirable properties: they are self-implementable, guarantee that stronger asynchronous failure detectors solve more problems, and ensure that their outputs encode no information other than process crashes. We introduce the notion of a failure detector being representative of a problem to capture the idea that some problems encode the same information about process crashes as their weakest failure detectors do. We show that a large class of problems, called finite problems, do not have representative failure detectors.
National Science Foundation (U.S.) (Science and Technology Center, grant agreement CCF-0939370) )
National Science Foundation (U.S.) (NSF Award Number CCF-0726514)
National Science Foundation (U.S.) (NSF Award Number CCF-0937274)
United States. Air Force Office of Scientific Research (AFOSR Award Number FA9550-08-1-0159)
National Science Foundation (U.S.) (NSF Award Number CNS-1035199)