High-Availability Computing Platform with Sensor Fault Resilience
Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/21/2/542 |
id |
doaj-9b5b2f00d9c54b458dda8896476b4bf3 |
---|---|
record_format |
Article |
spelling |
doaj-9b5b2f00d9c54b458dda8896476b4bf32021-01-14T00:05:05ZengMDPI AGSensors1424-82202021-01-012154254210.3390/s21020542High-Availability Computing Platform with Sensor Fault ResilienceYen-Lin Lee0Shinta Nuraisya Arizky1Yu-Ren Chen2Deron Liang3Wei-Jen Wang4Department of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320, TaiwanModern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating.https://www.mdpi.com/1424-8220/21/2/542failoverhigh availabilitysensor faultfault detection and recoveryliveness detection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yen-Lin Lee Shinta Nuraisya Arizky Yu-Ren Chen Deron Liang Wei-Jen Wang |
spellingShingle |
Yen-Lin Lee Shinta Nuraisya Arizky Yu-Ren Chen Deron Liang Wei-Jen Wang High-Availability Computing Platform with Sensor Fault Resilience Sensors failover high availability sensor fault fault detection and recovery liveness detection |
author_facet |
Yen-Lin Lee Shinta Nuraisya Arizky Yu-Ren Chen Deron Liang Wei-Jen Wang |
author_sort |
Yen-Lin Lee |
title |
High-Availability Computing Platform with Sensor Fault Resilience |
title_short |
High-Availability Computing Platform with Sensor Fault Resilience |
title_full |
High-Availability Computing Platform with Sensor Fault Resilience |
title_fullStr |
High-Availability Computing Platform with Sensor Fault Resilience |
title_full_unstemmed |
High-Availability Computing Platform with Sensor Fault Resilience |
title_sort |
high-availability computing platform with sensor fault resilience |
publisher |
MDPI AG |
series |
Sensors |
issn |
1424-8220 |
publishDate |
2021-01-01 |
description |
Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating. |
topic |
failover high availability sensor fault fault detection and recovery liveness detection |
url |
https://www.mdpi.com/1424-8220/21/2/542 |
work_keys_str_mv |
AT yenlinlee highavailabilitycomputingplatformwithsensorfaultresilience AT shintanuraisyaarizky highavailabilitycomputingplatformwithsensorfaultresilience AT yurenchen highavailabilitycomputingplatformwithsensorfaultresilience AT deronliang highavailabilitycomputingplatformwithsensorfaultresilience AT weijenwang highavailabilitycomputingplatformwithsensorfaultresilience |
_version_ |
1724338585547571200 |