Modeling performance and power for energy-efficient GPGPU computing

The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to i...

Full description

Bibliographic Details
Main Author: Hong, Sunpyo
Published: Georgia Institute of Technology 2013
Subjects:
GPU
Online Access:http://hdl.handle.net/1853/45922
id ndltd-GATECH-oai-smartech.gatech.edu-1853-45922
record_format oai_dc
spelling ndltd-GATECH-oai-smartech.gatech.edu-1853-459222013-05-30T03:06:05ZModeling performance and power for energy-efficient GPGPU computingHong, SunpyoModelPowerEnergyGPGPUGPUAnalytical modelPerformanceGraphics processing unitsComputer architectureEnergy consumptionThe objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.Georgia Institute of Technology2013-01-17T22:01:31Z2013-01-17T22:01:31Z2012-11-12Dissertationhttp://hdl.handle.net/1853/45922
collection NDLTD
sources NDLTD
topic Model
Power
Energy
GPGPU
GPU
Analytical model
Performance
Graphics processing units
Computer architecture
Energy consumption
spellingShingle Model
Power
Energy
GPGPU
GPU
Analytical model
Performance
Graphics processing units
Computer architecture
Energy consumption
Hong, Sunpyo
Modeling performance and power for energy-efficient GPGPU computing
description The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.
author Hong, Sunpyo
author_facet Hong, Sunpyo
author_sort Hong, Sunpyo
title Modeling performance and power for energy-efficient GPGPU computing
title_short Modeling performance and power for energy-efficient GPGPU computing
title_full Modeling performance and power for energy-efficient GPGPU computing
title_fullStr Modeling performance and power for energy-efficient GPGPU computing
title_full_unstemmed Modeling performance and power for energy-efficient GPGPU computing
title_sort modeling performance and power for energy-efficient gpgpu computing
publisher Georgia Institute of Technology
publishDate 2013
url http://hdl.handle.net/1853/45922
work_keys_str_mv AT hongsunpyo modelingperformanceandpowerforenergyefficientgpgpucomputing
_version_ 1716586020360683520