Towards Fault-Tolerance Intuitively in the Disk Array Systems

博士 === 國立成功大學 === 工程科學系碩博士班 === 94 ===  In order to achieve high reliability in disk array systems, an efficient and intuitive decoding method for tolerating disk failures based on parity placement scheme needs to be explored. This dissertation proposes a simple and convenient recovery algorithm (or...

Full description

Bibliographic Details
Main Authors: Chih-Shing Tau, 陶志行
Other Authors: Tzone-I Wang
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/09237707250009769355
Description
Summary:博士 === 國立成功大學 === 工程科學系碩博士班 === 94 ===  In order to achieve high reliability in disk array systems, an efficient and intuitive decoding method for tolerating disk failures based on parity placement scheme needs to be explored. This dissertation proposes a simple and convenient recovery algorithm (or decoding method) to deal with the faulty disks problem (given k, generally less than or equal to 3) occurring frequently in disk array system (given the number of disks N). It is based on modulo 2 arithmetic, parity, and exclusive-OR operations to make the recovery speed faster than other schemes that require computation over finite fields. To begin with, the data/parity placement problem is transformed into the problem of constructing k(N – 1) linear equations so that the problem can be solved immediately. Then, it illustrates how this method works through a known scheme, EVENODD, which tolerates double disk failure in disk array systems, and how it also works through HDD1 (Horizontal and Dual Diagonal) and HDD2, new schemes introduced here to improve triple parity placement schemes to enhance the reliability of a disk array system. Moreover, it is inferred that the proposed method may be applied to any known parity placement scheme and perform even more efficiently. Finally, this dissertation proposes an XOR-based decoding algorithm, Row-Oblique Parity (ROP), for protecting against double disk failure in disk array systems. ROP is provably optimal in computational complexity, both during construction and reconstruction. It is optimal in the amount of redundant information stored and accessed. The simplicity of ROP allowed us to implement it within the current available RAID framework. The decoding algorithms this dissertation proposes here are rather simple and some steps of its decoding procedure can be executed in parallel that makes faster the disk failures recovery. In comparison with other schemes in RAID systems, these decoding methods are simpler and more efficient because it can be implemented by current available software technology. Moreover, most of them do not even need to modify the hardware during the implementation.