High-Availability Computing Platform with Sensor Fault Resilience

Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In...

Full description

Bibliographic Details
Main Authors: Yen-Lin Lee, Shinta Nuraisya Arizky, Yu-Ren Chen, Deron Liang, Wei-Jen Wang
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/21/2/542
id doaj-9b5b2f00d9c54b458dda8896476b4bf3
record_format Article
spelling doaj-9b5b2f00d9c54b458dda8896476b4bf32021-01-14T00:05:05ZengMDPI AGSensors1424-82202021-01-012154254210.3390/s21020542High-Availability Computing Platform with Sensor Fault ResilienceYen-Lin Lee0Shinta Nuraisya Arizky1Yu-Ren Chen2Deron Liang3Wei-Jen Wang4Department of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanModern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating.https://www.mdpi.com/1424-8220/21/2/542failoverhigh availabilitysensor faultfault detection and recoveryliveness detection
collection DOAJ
language English
format Article
sources DOAJ
author Yen-Lin Lee
Shinta Nuraisya Arizky
Yu-Ren Chen
Deron Liang
Wei-Jen Wang
spellingShingle Yen-Lin Lee
Shinta Nuraisya Arizky
Yu-Ren Chen
Deron Liang
Wei-Jen Wang
High-Availability Computing Platform with Sensor Fault Resilience
Sensors
failover
high availability
sensor fault
fault detection and recovery
liveness detection
author_facet Yen-Lin Lee
Shinta Nuraisya Arizky
Yu-Ren Chen
Deron Liang
Wei-Jen Wang
author_sort Yen-Lin Lee
title High-Availability Computing Platform with Sensor Fault Resilience
title_short High-Availability Computing Platform with Sensor Fault Resilience
title_full High-Availability Computing Platform with Sensor Fault Resilience
title_fullStr High-Availability Computing Platform with Sensor Fault Resilience
title_full_unstemmed High-Availability Computing Platform with Sensor Fault Resilience
title_sort high-availability computing platform with sensor fault resilience
publisher MDPI AG
series Sensors
issn 1424-8220
publishDate 2021-01-01
description Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating.
topic failover
high availability
sensor fault
fault detection and recovery
liveness detection
url https://www.mdpi.com/1424-8220/21/2/542
work_keys_str_mv AT yenlinlee highavailabilitycomputingplatformwithsensorfaultresilience
AT shintanuraisyaarizky highavailabilitycomputingplatformwithsensorfaultresilience
AT yurenchen highavailabilitycomputingplatformwithsensorfaultresilience
AT deronliang highavailabilitycomputingplatformwithsensorfaultresilience
AT weijenwang highavailabilitycomputingplatformwithsensorfaultresilience
_version_ 1724338585547571200