A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems

Through-Silicon-Via (TSV) based 3D Integrated Circuits (3D-IC) are one of the most advanced architectures by providing low power consumption, shorter wire length and smaller footprint. However, 3D-ICs confront lifetime reliability due to high operating temperature and interconnect reliability, espec...

Full description

Bibliographic Details
Main Authors: Khanh N. Dang, Akram Ben Ahmed, Abderazek Ben Abdallah, Xuan-Tu Tran
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9189765/
id doaj-f3682c1c3eb24acfbcfeac489780c887
record_format Article
spelling doaj-f3682c1c3eb24acfbcfeac489780c8872021-03-30T03:27:26ZengIEEEIEEE Access2169-35362020-01-01816664216665710.1109/ACCESS.2020.30229049189765A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC SystemsKhanh N. Dang0https://orcid.org/0000-0001-6702-3870Akram Ben Ahmed1https://orcid.org/0000-0002-1253-8620Abderazek Ben Abdallah2https://orcid.org/0000-0003-3432-0718Xuan-Tu Tran3https://orcid.org/0000-0003-4259-9579VNU Key Laboratory for Smart Integrated Systems (SISLAB), VNU University of Engineering and Technology (VNU-UET), Vietnam National University,Hanoi (VNU), Hanoi, VietnamNational Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, JapanAdaptive Systems Laboratory, The University of Aizu, Aizu-Wakamatsu, JapanVNU Key Laboratory for Smart Integrated Systems (SISLAB), VNU University of Engineering and Technology (VNU-UET), Vietnam National University,Hanoi (VNU), Hanoi, VietnamThrough-Silicon-Via (TSV) based 3D Integrated Circuits (3D-IC) are one of the most advanced architectures by providing low power consumption, shorter wire length and smaller footprint. However, 3D-ICs confront lifetime reliability due to high operating temperature and interconnect reliability, especially the Through-Silicon-Via (TSV), which can significantly affect the accuracy of the applications. In this paper, we present an online method that supports the detection and correction of lifetime TSV failures, named IaSiG. By reusing the conventional recovery method and analyzing the output syndromes, IaSiG can determine and correct the defective TSVs. Results show that within a group, R redundant TSVs can fully localize and correct R defects and support the detection of R+1 defects. Moreover, by using G groups, it can localize up to GxR and detect up to G x (R + 1) defects. An implementation of IaSiG for 32-bit data in eight groups and two redundancies has a worst-case execution time (WCET) of 5,152 cycles while supporting at most 16 defective TSVs (50% localization). By integrating IaSiG onto a 3D Network-on-Chip, we also perform a grid-search based empirical method to insert suitable numbers of redundancies into TSV groups. The empirical method takes the operating temperature as the factor of accelerated fault due to the fact that temperature is one of the major issues of 3D-ICs. The results show that the proposed method can reduce the number of redundancies from the uniform method while still maintaining the required Mean Time to Failure.https://ieeexplore.ieee.org/document/9189765/Fault-tolerancefault detectionparity checkthrough silicon viareal-timethermal aware
collection DOAJ
language English
format Article
sources DOAJ
author Khanh N. Dang
Akram Ben Ahmed
Abderazek Ben Abdallah
Xuan-Tu Tran
spellingShingle Khanh N. Dang
Akram Ben Ahmed
Abderazek Ben Abdallah
Xuan-Tu Tran
A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
IEEE Access
Fault-tolerance
fault detection
parity check
through silicon via
real-time
thermal aware
author_facet Khanh N. Dang
Akram Ben Ahmed
Abderazek Ben Abdallah
Xuan-Tu Tran
author_sort Khanh N. Dang
title A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
title_short A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
title_full A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
title_fullStr A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
title_full_unstemmed A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
title_sort thermal-aware on-line fault tolerance method for tsv lifetime reliability in 3d-noc systems
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Through-Silicon-Via (TSV) based 3D Integrated Circuits (3D-IC) are one of the most advanced architectures by providing low power consumption, shorter wire length and smaller footprint. However, 3D-ICs confront lifetime reliability due to high operating temperature and interconnect reliability, especially the Through-Silicon-Via (TSV), which can significantly affect the accuracy of the applications. In this paper, we present an online method that supports the detection and correction of lifetime TSV failures, named IaSiG. By reusing the conventional recovery method and analyzing the output syndromes, IaSiG can determine and correct the defective TSVs. Results show that within a group, R redundant TSVs can fully localize and correct R defects and support the detection of R+1 defects. Moreover, by using G groups, it can localize up to GxR and detect up to G x (R + 1) defects. An implementation of IaSiG for 32-bit data in eight groups and two redundancies has a worst-case execution time (WCET) of 5,152 cycles while supporting at most 16 defective TSVs (50% localization). By integrating IaSiG onto a 3D Network-on-Chip, we also perform a grid-search based empirical method to insert suitable numbers of redundancies into TSV groups. The empirical method takes the operating temperature as the factor of accelerated fault due to the fact that temperature is one of the major issues of 3D-ICs. The results show that the proposed method can reduce the number of redundancies from the uniform method while still maintaining the required Mean Time to Failure.
topic Fault-tolerance
fault detection
parity check
through silicon via
real-time
thermal aware
url https://ieeexplore.ieee.org/document/9189765/
work_keys_str_mv AT khanhndang athermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT akrambenahmed athermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT abderazekbenabdallah athermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT xuantutran athermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT khanhndang thermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT akrambenahmed thermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT abderazekbenabdallah thermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
AT xuantutran thermalawareonlinefaulttolerancemethodfortsvlifetimereliabilityin3dnocsystems
_version_ 1724183473363615744