Design and implementation of a multi-thread unified SIMD graphics processor
碩士 === 國立中山大學 === 資訊工程學系研究所 === 101 === This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/84131385918275214476 |
id |
ndltd-TW-101NSYS5392070 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NSYS53920702015-10-13T22:40:49Z http://ndltd.ncl.edu.tw/handle/84131385918275214476 Design and implementation of a multi-thread unified SIMD graphics processor 多執行緒SIMD統一圖形處理器的設計與實作 Chao-yi Hsu 徐肇謚 碩士 國立中山大學 資訊工程學系研究所 101 This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in the related embedded GPU literatures. First, in addition to the fundamental vertex and fragment shaders, the proposed GPU also supports the execution of software implementation for those fixed functions including clipping, back-face culling, and rasterization which are mainly used in the middle of graphics rending flow. More than several hundreds of thousands of gates can be saved. Secondly, a three-bank two-port register file architecture has been proposed which can contribute to another big saving of GPU implementation cost by avoiding the use of four-port register file. Based on this multi-bank register file, the vertex and fragment threads will be distributed and associated with different register banks. Different banks of threads will be executed in GPU data-path alternatively in time-multiplexing method. When the execution of a thread encounters a stall due to the texture miss, it will be swapped with another thread in the same bank which is ready to run. Due to the unique alternative execution style, the penalty of both RAW and branch hazards in our 10-stage pipeline GPU can be at most one. To avoid the redundant processing of the same vertex in triangle modes of fan and stripe, a special vertex-fill unit is also implemented. The proposed GPU also realizes the multi-texture function which can support the mapping of many textures. The gate count of the proposed unified multi-threaded graphics processor is approximately 204K (does not include memory). Yun-Nan Chang 張雲南 2013 學位論文 ; thesis 71 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊工程學系研究所 === 101 === This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in the related embedded GPU literatures. First, in addition to the fundamental vertex and fragment shaders, the proposed GPU also supports the execution of software implementation for those fixed functions including clipping, back-face culling, and rasterization which are mainly used in the middle of graphics rending flow. More than several hundreds of thousands of gates can be saved. Secondly, a three-bank two-port register file architecture has been proposed which can contribute to another big saving of GPU implementation cost by avoiding the use of four-port register file. Based on this multi-bank register file, the vertex and fragment threads will be distributed and associated with different register banks. Different banks of threads will be executed in GPU data-path alternatively in time-multiplexing method. When the execution of a thread encounters a stall due to the texture miss, it will be swapped with another thread in the same bank which is ready to run. Due to the unique alternative execution style, the penalty of both RAW and branch hazards in our 10-stage pipeline GPU can be at most one. To avoid the redundant processing of the same vertex in triangle modes of fan and stripe, a special vertex-fill unit is also implemented. The proposed GPU also realizes the multi-texture function which can support the mapping of many textures. The gate count of the proposed unified multi-threaded graphics processor is approximately 204K (does not include memory).
|
author2 |
Yun-Nan Chang |
author_facet |
Yun-Nan Chang Chao-yi Hsu 徐肇謚 |
author |
Chao-yi Hsu 徐肇謚 |
spellingShingle |
Chao-yi Hsu 徐肇謚 Design and implementation of a multi-thread unified SIMD graphics processor |
author_sort |
Chao-yi Hsu |
title |
Design and implementation of a multi-thread unified SIMD graphics processor |
title_short |
Design and implementation of a multi-thread unified SIMD graphics processor |
title_full |
Design and implementation of a multi-thread unified SIMD graphics processor |
title_fullStr |
Design and implementation of a multi-thread unified SIMD graphics processor |
title_full_unstemmed |
Design and implementation of a multi-thread unified SIMD graphics processor |
title_sort |
design and implementation of a multi-thread unified simd graphics processor |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/84131385918275214476 |
work_keys_str_mv |
AT chaoyihsu designandimplementationofamultithreadunifiedsimdgraphicsprocessor AT xúzhàoshì designandimplementationofamultithreadunifiedsimdgraphicsprocessor AT chaoyihsu duōzhíxíngxùsimdtǒngyītúxíngchùlǐqìdeshèjìyǔshízuò AT xúzhàoshì duōzhíxíngxùsimdtǒngyītúxíngchùlǐqìdeshèjìyǔshízuò |
_version_ |
1718079642634551296 |