A Text Extraction Algorithm of Complex Document Images

碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in docu...

Full description

Bibliographic Details
Main Authors: Wen-Pin Wang, 王文賓
Other Authors: Ben-Fei Wu
Format: Others
Language:zh-TW
Published: 2003
Online Access:http://ndltd.ncl.edu.tw/handle/65385313010865626593
id ndltd-TW-091NCTU1706044
record_format oai_dc
spelling ndltd-TW-091NCTU17060442016-06-22T04:14:29Z http://ndltd.ncl.edu.tw/handle/65385313010865626593 A Text Extraction Algorithm of Complex Document Images 複雜文件影像的文字抽取技術 Wen-Pin Wang 王文賓 碩士 國立交通大學 電資學院學程碩士班 91 The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis. This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. Ben-Fei Wu 吳炳飛 2003 學位論文 ; thesis 92 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis. This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background.
author2 Ben-Fei Wu
author_facet Ben-Fei Wu
Wen-Pin Wang
王文賓
author Wen-Pin Wang
王文賓
spellingShingle Wen-Pin Wang
王文賓
A Text Extraction Algorithm of Complex Document Images
author_sort Wen-Pin Wang
title A Text Extraction Algorithm of Complex Document Images
title_short A Text Extraction Algorithm of Complex Document Images
title_full A Text Extraction Algorithm of Complex Document Images
title_fullStr A Text Extraction Algorithm of Complex Document Images
title_full_unstemmed A Text Extraction Algorithm of Complex Document Images
title_sort text extraction algorithm of complex document images
publishDate 2003
url http://ndltd.ncl.edu.tw/handle/65385313010865626593
work_keys_str_mv AT wenpinwang atextextractionalgorithmofcomplexdocumentimages
AT wángwénbīn atextextractionalgorithmofcomplexdocumentimages
AT wenpinwang fùzáwénjiànyǐngxiàngdewénzìchōuqǔjìshù
AT wángwénbīn fùzáwénjiànyǐngxiàngdewénzìchōuqǔjìshù
AT wenpinwang textextractionalgorithmofcomplexdocumentimages
AT wángwénbīn textextractionalgorithmofcomplexdocumentimages
_version_ 1718315646342660096