A Text Extraction Algorithm of Complex Document Images
碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in docu...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2003
|
Online Access: | http://ndltd.ncl.edu.tw/handle/65385313010865626593 |
id |
ndltd-TW-091NCTU1706044 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-091NCTU17060442016-06-22T04:14:29Z http://ndltd.ncl.edu.tw/handle/65385313010865626593 A Text Extraction Algorithm of Complex Document Images 複雜文件影像的文字抽取技術 Wen-Pin Wang 王文賓 碩士 國立交通大學 電資學院學程碩士班 91 The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis. This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. Ben-Fei Wu 吳炳飛 2003 學位論文 ; thesis 92 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis.
This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background.
Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background.
|
author2 |
Ben-Fei Wu |
author_facet |
Ben-Fei Wu Wen-Pin Wang 王文賓 |
author |
Wen-Pin Wang 王文賓 |
spellingShingle |
Wen-Pin Wang 王文賓 A Text Extraction Algorithm of Complex Document Images |
author_sort |
Wen-Pin Wang |
title |
A Text Extraction Algorithm of Complex Document Images |
title_short |
A Text Extraction Algorithm of Complex Document Images |
title_full |
A Text Extraction Algorithm of Complex Document Images |
title_fullStr |
A Text Extraction Algorithm of Complex Document Images |
title_full_unstemmed |
A Text Extraction Algorithm of Complex Document Images |
title_sort |
text extraction algorithm of complex document images |
publishDate |
2003 |
url |
http://ndltd.ncl.edu.tw/handle/65385313010865626593 |
work_keys_str_mv |
AT wenpinwang atextextractionalgorithmofcomplexdocumentimages AT wángwénbīn atextextractionalgorithmofcomplexdocumentimages AT wenpinwang fùzáwénjiànyǐngxiàngdewénzìchōuqǔjìshù AT wángwénbīn fùzáwénjiànyǐngxiàngdewénzìchōuqǔjìshù AT wenpinwang textextractionalgorithmofcomplexdocumentimages AT wángwénbīn textextractionalgorithmofcomplexdocumentimages |
_version_ |
1718315646342660096 |