Design and implementation of a multi-thread unified SIMD graphics processor

碩士 === 國立中山大學 === 資訊工程學系研究所 === 101 === This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in...

Full description

Bibliographic Details
Main Authors: Chao-yi Hsu, 徐肇謚
Other Authors: Yun-Nan Chang
Format: Others
Language:zh-TW
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/84131385918275214476
id ndltd-TW-101NSYS5392070
record_format oai_dc
spelling ndltd-TW-101NSYS53920702015-10-13T22:40:49Z http://ndltd.ncl.edu.tw/handle/84131385918275214476 Design and implementation of a multi-thread unified SIMD graphics processor 多執行緒SIMD統一圖形處理器的設計與實作 Chao-yi Hsu 徐肇謚 碩士 國立中山大學 資訊工程學系研究所 101 This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in the related embedded GPU literatures. First, in addition to the fundamental vertex and fragment shaders, the proposed GPU also supports the execution of software implementation for those fixed functions including clipping, back-face culling, and rasterization which are mainly used in the middle of graphics rending flow. More than several hundreds of thousands of gates can be saved. Secondly, a three-bank two-port register file architecture has been proposed which can contribute to another big saving of GPU implementation cost by avoiding the use of four-port register file. Based on this multi-bank register file, the vertex and fragment threads will be distributed and associated with different register banks. Different banks of threads will be executed in GPU data-path alternatively in time-multiplexing method. When the execution of a thread encounters a stall due to the texture miss, it will be swapped with another thread in the same bank which is ready to run. Due to the unique alternative execution style, the penalty of both RAW and branch hazards in our 10-stage pipeline GPU can be at most one. To avoid the redundant processing of the same vertex in triangle modes of fan and stripe, a special vertex-fill unit is also implemented. The proposed GPU also realizes the multi-texture function which can support the mapping of many textures. The gate count of the proposed unified multi-threaded graphics processor is approximately 204K (does not include memory). Yun-Nan Chang 張雲南 2013 學位論文 ; thesis 71 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 資訊工程學系研究所 === 101 === This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in the related embedded GPU literatures. First, in addition to the fundamental vertex and fragment shaders, the proposed GPU also supports the execution of software implementation for those fixed functions including clipping, back-face culling, and rasterization which are mainly used in the middle of graphics rending flow. More than several hundreds of thousands of gates can be saved. Secondly, a three-bank two-port register file architecture has been proposed which can contribute to another big saving of GPU implementation cost by avoiding the use of four-port register file. Based on this multi-bank register file, the vertex and fragment threads will be distributed and associated with different register banks. Different banks of threads will be executed in GPU data-path alternatively in time-multiplexing method. When the execution of a thread encounters a stall due to the texture miss, it will be swapped with another thread in the same bank which is ready to run. Due to the unique alternative execution style, the penalty of both RAW and branch hazards in our 10-stage pipeline GPU can be at most one. To avoid the redundant processing of the same vertex in triangle modes of fan and stripe, a special vertex-fill unit is also implemented. The proposed GPU also realizes the multi-texture function which can support the mapping of many textures. The gate count of the proposed unified multi-threaded graphics processor is approximately 204K (does not include memory).
author2 Yun-Nan Chang
author_facet Yun-Nan Chang
Chao-yi Hsu
徐肇謚
author Chao-yi Hsu
徐肇謚
spellingShingle Chao-yi Hsu
徐肇謚
Design and implementation of a multi-thread unified SIMD graphics processor
author_sort Chao-yi Hsu
title Design and implementation of a multi-thread unified SIMD graphics processor
title_short Design and implementation of a multi-thread unified SIMD graphics processor
title_full Design and implementation of a multi-thread unified SIMD graphics processor
title_fullStr Design and implementation of a multi-thread unified SIMD graphics processor
title_full_unstemmed Design and implementation of a multi-thread unified SIMD graphics processor
title_sort design and implementation of a multi-thread unified simd graphics processor
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/84131385918275214476
work_keys_str_mv AT chaoyihsu designandimplementationofamultithreadunifiedsimdgraphicsprocessor
AT xúzhàoshì designandimplementationofamultithreadunifiedsimdgraphicsprocessor
AT chaoyihsu duōzhíxíngxùsimdtǒngyītúxíngchùlǐqìdeshèjìyǔshízuò
AT xúzhàoshì duōzhíxíngxùsimdtǒngyītúxíngchùlǐqìdeshèjìyǔshízuò
_version_ 1718079642634551296