Self-correcting strategy for networks-on-chip interconnect

Networks-on-Chip (NoC) interconnection provides an on-chip communication strategy for a large number of processing elements System-on- Chip. Fault tolerance is a challenge for modern NoCs due to the increase in physical defects in advanced manufacturing processes. A key requirement for modern NoCs i...

Full description

Bibliographic Details
Main Author: Liu, Junxiu
Published: Ulster University 2015
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.675467
id ndltd-bl.uk-oai-ethos.bl.uk-675467
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6754672016-08-04T03:21:31ZSelf-correcting strategy for networks-on-chip interconnectLiu, Junxiu2015Networks-on-Chip (NoC) interconnection provides an on-chip communication strategy for a large number of processing elements System-on- Chip. Fault tolerance is a challenge for modern NoCs due to the increase in physical defects in advanced manufacturing processes. A key requirement for modern NoCs is the ability to detect faults and failures and to self-correct after faults occur thereby maintaining a level of system functionality. However, existing fault-tolerant approaches cannot fully address system scalability and fault testing with minimal intrusion, in addition they fail to provide robust self-correction strategies under complex traffic conditions. Therefore, it is necessary to look to new fault detection and self-correction strategies to address this reliable design issue and to enable the design of reliable systems on unreliable fabrics. This thesis presents a novel online fault detection strategy where the intrusion of the runtime operation under testing is minimised. If the channel is faulty, an alert flag is raised. By using this alert flag mechanism, three novel fault-tolerant adaptive routing algorithms are proposed to provide selfcorrecting strategies for NoCs. They exploit the status of real-time traffic with different levels (local or regional) look-ahead functions, then calculate weights for output directions or path candidates, and choose the path with the lowest weighting to forward the packets. The key benefit of these routing algorithms is to bypass a routing path with faulty channels while minimising congestion for the adjacent connected channels. The detailed experimental results are given for a range of testing conditions, traffic patterns and fault rates, which demonstrate that the faults can be detected promptly with minimal intrusion and the routing algorithms are able to maintain a level of system functionality under high fault rates with a low cost. In particular, experimental results demonstrate that the proposed detection and self-correction strategy achieves an overall between 24%-62% improvement on throughput degradation under varied high fault rates compared to benchmarks. The thesis also presents an open-source monitoring mechanism which provides an evaluation and benchmarking mechanism to quantitatively analyse a hardware NoC system's fault-tolerant capability. By using this monitoring mechanism, the thesis concludes with hardware verification of the detection and self-correction algorithms in FPGA hardware. The FPGA implementations present the throughput performance, fault-tolerant capabilities and resource costs of the three different fault-tolerant adaptive routing algorithms, in particular, the implementations demonstrate the realtime operation of the proposed self-correction strategies in hardware while under the presence of varied levels of faults.004.6Ulster Universityhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.675467Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 004.6
spellingShingle 004.6
Liu, Junxiu
Self-correcting strategy for networks-on-chip interconnect
description Networks-on-Chip (NoC) interconnection provides an on-chip communication strategy for a large number of processing elements System-on- Chip. Fault tolerance is a challenge for modern NoCs due to the increase in physical defects in advanced manufacturing processes. A key requirement for modern NoCs is the ability to detect faults and failures and to self-correct after faults occur thereby maintaining a level of system functionality. However, existing fault-tolerant approaches cannot fully address system scalability and fault testing with minimal intrusion, in addition they fail to provide robust self-correction strategies under complex traffic conditions. Therefore, it is necessary to look to new fault detection and self-correction strategies to address this reliable design issue and to enable the design of reliable systems on unreliable fabrics. This thesis presents a novel online fault detection strategy where the intrusion of the runtime operation under testing is minimised. If the channel is faulty, an alert flag is raised. By using this alert flag mechanism, three novel fault-tolerant adaptive routing algorithms are proposed to provide selfcorrecting strategies for NoCs. They exploit the status of real-time traffic with different levels (local or regional) look-ahead functions, then calculate weights for output directions or path candidates, and choose the path with the lowest weighting to forward the packets. The key benefit of these routing algorithms is to bypass a routing path with faulty channels while minimising congestion for the adjacent connected channels. The detailed experimental results are given for a range of testing conditions, traffic patterns and fault rates, which demonstrate that the faults can be detected promptly with minimal intrusion and the routing algorithms are able to maintain a level of system functionality under high fault rates with a low cost. In particular, experimental results demonstrate that the proposed detection and self-correction strategy achieves an overall between 24%-62% improvement on throughput degradation under varied high fault rates compared to benchmarks. The thesis also presents an open-source monitoring mechanism which provides an evaluation and benchmarking mechanism to quantitatively analyse a hardware NoC system's fault-tolerant capability. By using this monitoring mechanism, the thesis concludes with hardware verification of the detection and self-correction algorithms in FPGA hardware. The FPGA implementations present the throughput performance, fault-tolerant capabilities and resource costs of the three different fault-tolerant adaptive routing algorithms, in particular, the implementations demonstrate the realtime operation of the proposed self-correction strategies in hardware while under the presence of varied levels of faults.
author Liu, Junxiu
author_facet Liu, Junxiu
author_sort Liu, Junxiu
title Self-correcting strategy for networks-on-chip interconnect
title_short Self-correcting strategy for networks-on-chip interconnect
title_full Self-correcting strategy for networks-on-chip interconnect
title_fullStr Self-correcting strategy for networks-on-chip interconnect
title_full_unstemmed Self-correcting strategy for networks-on-chip interconnect
title_sort self-correcting strategy for networks-on-chip interconnect
publisher Ulster University
publishDate 2015
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.675467
work_keys_str_mv AT liujunxiu selfcorrectingstrategyfornetworksonchipinterconnect
_version_ 1718368913316642816