Special Typeface Identification in Chinese Document Images
碩士 === 大葉大學 === 資訊管理學系碩士班 === 93 === Optical character recognition (OCR) is a famous research subject in recent twenty years. To digitize paper documents by applying OCR techniques can decrease the document storage space. These digitized document images can be classified and retrieved conveniently....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/40694443264057230071 |
id |
ndltd-TW-093DYU00396018 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093DYU003960182015-10-13T11:39:44Z http://ndltd.ncl.edu.tw/handle/40694443264057230071 Special Typeface Identification in Chinese Document Images 中文文件影像中之特殊字體偵測 Lin Yu-Yuan 林裕淵 碩士 大葉大學 資訊管理學系碩士班 93 Optical character recognition (OCR) is a famous research subject in recent twenty years. To digitize paper documents by applying OCR techniques can decrease the document storage space. These digitized document images can be classified and retrieved conveniently. At present, commercial OCR products purported to provide a satisfactory recognition results whose recognition accuracy is over 90%. The accuracy is generally measured by recognizing those printed characters whose typefaces are normal. However, several special typefaces such as italic, underline, hollow, and boldface, poor recognition accuracy is obtained by commercial OCR systems. Since the amount of Chinese characters is large, the recognition speed is slow using a multi-engine OCR system. This paper proposes an approach to detect all characters in special typefaces. In the proposed typeface identification system, text lines and character components are extracted by analyzing the projection profiles of text block images. Then, several characteristics such as component sizes, gaps between two components, stroke widths, and black run lengths, are computed and analyzed to identify the typeface of each character. Afterward, a specific recognition engine is applied to recognize each unknown character according to the corresponding typeface identification result. Tseng Yi-Hong 曾逸鴻 2005 學位論文 ; thesis 51 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 大葉大學 === 資訊管理學系碩士班 === 93 === Optical character recognition (OCR) is a famous research subject in recent twenty years. To digitize paper documents by applying OCR techniques can decrease the document storage space. These digitized document images can be classified and retrieved conveniently.
At present, commercial OCR products purported to provide a satisfactory recognition results whose recognition accuracy is over 90%. The accuracy is generally measured by recognizing those printed characters whose typefaces are normal. However, several special typefaces such as italic, underline, hollow, and boldface, poor recognition accuracy is obtained by commercial OCR systems. Since the amount of Chinese characters is large, the recognition speed is slow using a multi-engine OCR system. This paper proposes an approach to detect all characters in special typefaces. In the proposed typeface identification system, text lines and character components are extracted by analyzing the projection profiles of text block images. Then, several characteristics such as component sizes, gaps between two components, stroke widths, and black run lengths, are computed and analyzed to identify the typeface of each character. Afterward, a specific recognition engine is applied to recognize each unknown character according to the corresponding typeface identification result.
|
author2 |
Tseng Yi-Hong |
author_facet |
Tseng Yi-Hong Lin Yu-Yuan 林裕淵 |
author |
Lin Yu-Yuan 林裕淵 |
spellingShingle |
Lin Yu-Yuan 林裕淵 Special Typeface Identification in Chinese Document Images |
author_sort |
Lin Yu-Yuan |
title |
Special Typeface Identification in Chinese Document Images |
title_short |
Special Typeface Identification in Chinese Document Images |
title_full |
Special Typeface Identification in Chinese Document Images |
title_fullStr |
Special Typeface Identification in Chinese Document Images |
title_full_unstemmed |
Special Typeface Identification in Chinese Document Images |
title_sort |
special typeface identification in chinese document images |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/40694443264057230071 |
work_keys_str_mv |
AT linyuyuan specialtypefaceidentificationinchinesedocumentimages AT línyùyuān specialtypefaceidentificationinchinesedocumentimages AT linyuyuan zhōngwénwénjiànyǐngxiàngzhōngzhītèshūzìtǐzhēncè AT línyùyuān zhōngwénwénjiànyǐngxiàngzhōngzhītèshūzìtǐzhēncè |
_version_ |
1716847054309818368 |