GPGPU Reliability Analysis: From Applications to Large Scale Systems

Over the past decade, GPUs have become an integral part of mainstream high-performance computing (HPC) facilities. Since applications running on HPC systems are usually long-running, any error or failure could result in significant loss in scientific productivity and system resources. Even worse, si...

Full description

Bibliographic Details
Main Author: Nie, Bin
Format: Others
Language:English
Published: W&M ScholarWorks 2019
Subjects:
Online Access:https://scholarworks.wm.edu/etd/1563898932
https://scholarworks.wm.edu/cgi/viewcontent.cgi?article=6802&context=etd