Special Typeface Identification in Chinese Document Images

碩士 === 大葉大學 === 資訊管理學系碩士班 === 93 === Optical character recognition (OCR) is a famous research subject in recent twenty years. To digitize paper documents by applying OCR techniques can decrease the document storage space. These digitized document images can be classified and retrieved conveniently....

Full description

Bibliographic Details
Main Authors: Lin Yu-Yuan, 林裕淵
Other Authors: Tseng Yi-Hong
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/40694443264057230071
id ndltd-TW-093DYU00396018
record_format oai_dc
spelling ndltd-TW-093DYU003960182015-10-13T11:39:44Z http://ndltd.ncl.edu.tw/handle/40694443264057230071 Special Typeface Identification in Chinese Document Images 中文文件影像中之特殊字體偵測 Lin Yu-Yuan 林裕淵 碩士 大葉大學 資訊管理學系碩士班 93 Optical character recognition (OCR) is a famous research subject in recent twenty years. To digitize paper documents by applying OCR techniques can decrease the document storage space. These digitized document images can be classified and retrieved conveniently. At present, commercial OCR products purported to provide a satisfactory recognition results whose recognition accuracy is over 90%. The accuracy is generally measured by recognizing those printed characters whose typefaces are normal. However, several special typefaces such as italic, underline, hollow, and boldface, poor recognition accuracy is obtained by commercial OCR systems. Since the amount of Chinese characters is large, the recognition speed is slow using a multi-engine OCR system. This paper proposes an approach to detect all characters in special typefaces. In the proposed typeface identification system, text lines and character components are extracted by analyzing the projection profiles of text block images. Then, several characteristics such as component sizes, gaps between two components, stroke widths, and black run lengths, are computed and analyzed to identify the typeface of each character. Afterward, a specific recognition engine is applied to recognize each unknown character according to the corresponding typeface identification result. Tseng Yi-Hong 曾逸鴻 2005 學位論文 ; thesis 51 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 大葉大學 === 資訊管理學系碩士班 === 93 === Optical character recognition (OCR) is a famous research subject in recent twenty years. To digitize paper documents by applying OCR techniques can decrease the document storage space. These digitized document images can be classified and retrieved conveniently. At present, commercial OCR products purported to provide a satisfactory recognition results whose recognition accuracy is over 90%. The accuracy is generally measured by recognizing those printed characters whose typefaces are normal. However, several special typefaces such as italic, underline, hollow, and boldface, poor recognition accuracy is obtained by commercial OCR systems. Since the amount of Chinese characters is large, the recognition speed is slow using a multi-engine OCR system. This paper proposes an approach to detect all characters in special typefaces. In the proposed typeface identification system, text lines and character components are extracted by analyzing the projection profiles of text block images. Then, several characteristics such as component sizes, gaps between two components, stroke widths, and black run lengths, are computed and analyzed to identify the typeface of each character. Afterward, a specific recognition engine is applied to recognize each unknown character according to the corresponding typeface identification result.
author2 Tseng Yi-Hong
author_facet Tseng Yi-Hong
Lin Yu-Yuan
林裕淵
author Lin Yu-Yuan
林裕淵
spellingShingle Lin Yu-Yuan
林裕淵
Special Typeface Identification in Chinese Document Images
author_sort Lin Yu-Yuan
title Special Typeface Identification in Chinese Document Images
title_short Special Typeface Identification in Chinese Document Images
title_full Special Typeface Identification in Chinese Document Images
title_fullStr Special Typeface Identification in Chinese Document Images
title_full_unstemmed Special Typeface Identification in Chinese Document Images
title_sort special typeface identification in chinese document images
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/40694443264057230071
work_keys_str_mv AT linyuyuan specialtypefaceidentificationinchinesedocumentimages
AT línyùyuān specialtypefaceidentificationinchinesedocumentimages
AT linyuyuan zhōngwénwénjiànyǐngxiàngzhōngzhītèshūzìtǐzhēncè
AT línyùyuān zhōngwénwénjiànyǐngxiàngzhōngzhītèshūzìtǐzhēncè
_version_ 1716847054309818368