Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor

博士 === 國立中山大學 === 資訊工程學系研究所 === 102 === This dissertation presents an efficient and fast VLSI architecture of a vector graphics hardware accelerator. To render a vector graphics object, the contours of paths that constitute the object have to be plotted first. The outline of a path is described by a...

Full description

Bibliographic Details
Main Authors: Ting-Chi Tung, 董庭吉
Other Authors: Yun-Nan Chang
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/96290413858843047703
id ndltd-TW-102NSYS5392019
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立中山大學 === 資訊工程學系研究所 === 102 === This dissertation presents an efficient and fast VLSI architecture of a vector graphics hardware accelerator. To render a vector graphics object, the contours of paths that constitute the object have to be plotted first. The outline of a path is described by a series of parametric segments, where the most frequently used type of segments are cubic Bézier curves (CBC). This dissertation presents an efficient intersection locator design used to generate the intersection information of cubic Bézier curves and scan-lines required for the two-dimensional (2D) graphics rasterization process. The conventional method of calculating the intersection point first approximates the curve with enough number of line segments and then solves two simultaneous equations of each line segment and scan-line. By extending the adaptive-forward-difference (AFD) algorithm to choose the proper successive sampling points of the curve, the proposed design can not only locate the intersection points precisely, but more importantly, it can avoid the use of complex functional units like multipliers and dividers which are typically used in solving simultaneous equations. The intersection information generated by our locator can lead to over 99.6 % accuracy. To tessellate the stroke contours of paths, this dissertation decomposes the stroke contours into a series of concatenated circular arcs and line segments. By transforming circular arcs and lines into the approximate cubic Bézier curves, all types of stroke contours can be plotted by homogeneous CBC processing modules. All the modules can be folded to a design that contains only one CBC tessellation core which can deal with all curve tessellation tasks including circular arcs and straight lines of stroke contours. To further reduce the control complexity required for utilizing the CBC core, a tessellation processor is proposed, which can be programmed to realize all the stroke functions. The burden of complex control circuit design can then been shifted to the coding of the control software. In addition to a main CBC core, the tessellation processor also contains several Look-up Tables (LUTs) to realize some complex arithmetic functions such as reciprocal, division, sine and cosine which are utilized while deriving the shell of the path. After the outlines of the vector graphics object have been tessellated, the filling regions of the object have to be found next in order to fill color. To decide the filling regions of the graphics object, a large on-chip scan-line buffer (SB) is very often used and frequently accessed to derive the pixel’s winding count. This dissertation proposes a special 2-bit coding scheme for each buffer entry along with the active-edge-table rescan method to record the intersection information of scan-lines and the object paths. In addition, for AA rendering applications, a coverage buffer is also proposed to avoid the duplication of SBs. Compared with the conventional approach, the required buffer size can be reduced by up to 89%. Besides buffer reduction, this dissertation also proposes a hierarchical SB architecture in which the upper-level buffer indicates which scan-line sections have intersected with objects in order to skip the access to successive buffer entries. The same technique, along with the differential coverage transformation, can also be applied to the coverage buffer. Our experimental results show that more than 87% of memory accesses can be reduced, which results in saving 66.4% of clock cycles in practical hardware implementation. The proposed vector graphics accelerator has been implemented based on SAED 90nm library, and can run up to 150 MHz. The total gate count of the rendering accelerator is about 225.06k, where the tessellation processor and rasterization accelerator consume about 199.32k and 16.76k gates respectively. The accelerator can render Tiger object over 45 frame per second (FPS) under the resolution of 240x320 pixels. The proposed accelerator can improve the overall rendering speed and is suitable for dedicated embedded applications.
author2 Yun-Nan Chang
author_facet Yun-Nan Chang
Ting-Chi Tung
董庭吉
author Ting-Chi Tung
董庭吉
spellingShingle Ting-Chi Tung
董庭吉
Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor
author_sort Ting-Chi Tung
title Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor
title_short Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor
title_full Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor
title_fullStr Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor
title_full_unstemmed Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor
title_sort design and implementation of vector graphics accelerator equipped with a tessellation processor
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/96290413858843047703
work_keys_str_mv AT tingchitung designandimplementationofvectorgraphicsacceleratorequippedwithatessellationprocessor
AT dǒngtíngjí designandimplementationofvectorgraphicsacceleratorequippedwithatessellationprocessor
AT tingchitung dāzàiqūxiànxìfēnchùlǐqìzhīxiàngliàngtúxíngjiāsùqìshèjìyǔshízuò
AT dǒngtíngjí dāzàiqūxiànxìfēnchùlǐqìzhīxiàngliàngtúxíngjiāsùqìshèjìyǔshízuò
_version_ 1718433773320667136
spelling ndltd-TW-102NSYS53920192017-03-22T04:42:37Z http://ndltd.ncl.edu.tw/handle/96290413858843047703 Design and Implementation of Vector Graphics Accelerator Equipped with a Tessellation Processor 搭載曲線細分處理器之向量圖形加速器設計與實作 Ting-Chi Tung 董庭吉 博士 國立中山大學 資訊工程學系研究所 102 This dissertation presents an efficient and fast VLSI architecture of a vector graphics hardware accelerator. To render a vector graphics object, the contours of paths that constitute the object have to be plotted first. The outline of a path is described by a series of parametric segments, where the most frequently used type of segments are cubic Bézier curves (CBC). This dissertation presents an efficient intersection locator design used to generate the intersection information of cubic Bézier curves and scan-lines required for the two-dimensional (2D) graphics rasterization process. The conventional method of calculating the intersection point first approximates the curve with enough number of line segments and then solves two simultaneous equations of each line segment and scan-line. By extending the adaptive-forward-difference (AFD) algorithm to choose the proper successive sampling points of the curve, the proposed design can not only locate the intersection points precisely, but more importantly, it can avoid the use of complex functional units like multipliers and dividers which are typically used in solving simultaneous equations. The intersection information generated by our locator can lead to over 99.6 % accuracy. To tessellate the stroke contours of paths, this dissertation decomposes the stroke contours into a series of concatenated circular arcs and line segments. By transforming circular arcs and lines into the approximate cubic Bézier curves, all types of stroke contours can be plotted by homogeneous CBC processing modules. All the modules can be folded to a design that contains only one CBC tessellation core which can deal with all curve tessellation tasks including circular arcs and straight lines of stroke contours. To further reduce the control complexity required for utilizing the CBC core, a tessellation processor is proposed, which can be programmed to realize all the stroke functions. The burden of complex control circuit design can then been shifted to the coding of the control software. In addition to a main CBC core, the tessellation processor also contains several Look-up Tables (LUTs) to realize some complex arithmetic functions such as reciprocal, division, sine and cosine which are utilized while deriving the shell of the path. After the outlines of the vector graphics object have been tessellated, the filling regions of the object have to be found next in order to fill color. To decide the filling regions of the graphics object, a large on-chip scan-line buffer (SB) is very often used and frequently accessed to derive the pixel’s winding count. This dissertation proposes a special 2-bit coding scheme for each buffer entry along with the active-edge-table rescan method to record the intersection information of scan-lines and the object paths. In addition, for AA rendering applications, a coverage buffer is also proposed to avoid the duplication of SBs. Compared with the conventional approach, the required buffer size can be reduced by up to 89%. Besides buffer reduction, this dissertation also proposes a hierarchical SB architecture in which the upper-level buffer indicates which scan-line sections have intersected with objects in order to skip the access to successive buffer entries. The same technique, along with the differential coverage transformation, can also be applied to the coverage buffer. Our experimental results show that more than 87% of memory accesses can be reduced, which results in saving 66.4% of clock cycles in practical hardware implementation. The proposed vector graphics accelerator has been implemented based on SAED 90nm library, and can run up to 150 MHz. The total gate count of the rendering accelerator is about 225.06k, where the tessellation processor and rasterization accelerator consume about 199.32k and 16.76k gates respectively. The accelerator can render Tiger object over 45 frame per second (FPS) under the resolution of 240x320 pixels. The proposed accelerator can improve the overall rendering speed and is suitable for dedicated embedded applications. Yun-Nan Chang 張雲南 2014 學位論文 ; thesis 153 en_US