A Hybrid Checkpointing Scheme in Message Passing Systems

碩士 === 國立成功大學 === 會計學系碩博士班 === 95 === If we apply checkpoint-based protocols to achieve fault-tolerance, besides taking checkpoints, it is a significant issue to ensure that consistent global states can be recovered when failures occur. Additional failure-free coordination overheads are ineluctable...

Full description

Bibliographic Details
Main Authors: Wei-Han Chen, 陳韋翰
Other Authors: LihChyun Shu
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/95930666895997913830
Description
Summary:碩士 === 國立成功大學 === 會計學系碩博士班 === 95 === If we apply checkpoint-based protocols to achieve fault-tolerance, besides taking checkpoints, it is a significant issue to ensure that consistent global states can be recovered when failures occur. Additional failure-free coordination overheads are ineluctable so that reduce the performance. Recently, many intensive researches have been studied to eliminate such overheads including by analyzing distributed programs and statically inserting checkpoint statements at the proper places in the source code. In this thesis, we propose a hybrid checkpoint scheme to leverage the advantages of both static analysis and online checkpointing. An algorithm to find orphan-free coupling nodes in extended control flow graph is shown and we apply it to several commonly used inter-process interacting paradigms. Tightly coupling strategy is to avoid any trouble path that checkpoint X happened before checkpoint Y from different processes in the CFG. However,if the application being analyzed may have trouble paths while executing operations in loops, it is unnecessary for tightly coupling strategy to move the checkpoint statement outside the loop to avoid trouble paths. Under our hybrid checkpoint scheme, the extent of recovery from failures can be bounded to at most one checkpoint interval such that domino effect will never appear.