Modeling performance and power for energy-efficient GPGPU computing

The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to i...

Full description

Bibliographic Details
Main Author:	Hong, Sunpyo
Published:	Georgia Institute of Technology 2013
Subjects:	Model Power Energy GPGPU GPU Analytical model Performance Graphics processing units Computer architecture Energy consumption
Online Access:	http://hdl.handle.net/1853/45922

id	ndltd-GATECH-oai-smartech.gatech.edu-1853-45922
record_format	oai_dc
spelling	ndltd-GATECH-oai-smartech.gatech.edu-1853-459222013-05-30T03:06:05ZModeling performance and power for energy-efficient GPGPU computingHong, SunpyoModelPowerEnergyGPGPUGPUAnalytical modelPerformanceGraphics processing unitsComputer architectureEnergy consumptionThe objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.Georgia Institute of Technology2013-01-17T22:01:31Z2013-01-17T22:01:31Z2012-11-12Dissertationhttp://hdl.handle.net/1853/45922
collection	NDLTD
sources	NDLTD
topic	Model Power Energy GPGPU GPU Analytical model Performance Graphics processing units Computer architecture Energy consumption
spellingShingle	Model Power Energy GPGPU GPU Analytical model Performance Graphics processing units Computer architecture Energy consumption Hong, Sunpyo Modeling performance and power for energy-efficient GPGPU computing
description	The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.
author	Hong, Sunpyo
author_facet	Hong, Sunpyo
author_sort	Hong, Sunpyo
title	Modeling performance and power for energy-efficient GPGPU computing
title_short	Modeling performance and power for energy-efficient GPGPU computing
title_full	Modeling performance and power for energy-efficient GPGPU computing
title_fullStr	Modeling performance and power for energy-efficient GPGPU computing
title_full_unstemmed	Modeling performance and power for energy-efficient GPGPU computing
title_sort	modeling performance and power for energy-efficient gpgpu computing
publisher	Georgia Institute of Technology
publishDate	2013
url	http://hdl.handle.net/1853/45922
work_keys_str_mv	AT hongsunpyo modelingperformanceandpowerforenergyefficientgpgpucomputing
_version_	1716586020360683520

Modeling performance and power for energy-efficient GPGPU computing

Similar Items