A Text Extraction Algorithm of Complex Document Images

碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in docu...

Full description

Bibliographic Details
Main Authors:	Wen-Pin Wang, 王文賓
Other Authors:	Ben-Fei Wu
Format:	Others
Language:	zh-TW
Published:	2003
Online Access:	http://ndltd.ncl.edu.tw/handle/65385313010865626593

id	ndltd-TW-091NCTU1706044
record_format	oai_dc
spelling	ndltd-TW-091NCTU17060442016-06-22T04:14:29Z http://ndltd.ncl.edu.tw/handle/65385313010865626593 A Text Extraction Algorithm of Complex Document Images 複雜文件影像的文字抽取技術 Wen-Pin Wang 王文賓碩士國立交通大學電資學院學程碩士班 91 The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis. This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. Ben-Fei Wu 吳炳飛 2003 學位論文 ; thesis 92 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis. This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background.
author2	Ben-Fei Wu
author_facet	Ben-Fei Wu Wen-Pin Wang 王文賓
author	Wen-Pin Wang 王文賓
spellingShingle	Wen-Pin Wang 王文賓 A Text Extraction Algorithm of Complex Document Images
author_sort	Wen-Pin Wang
title	A Text Extraction Algorithm of Complex Document Images
title_short	A Text Extraction Algorithm of Complex Document Images
title_full	A Text Extraction Algorithm of Complex Document Images
title_fullStr	A Text Extraction Algorithm of Complex Document Images
title_full_unstemmed	A Text Extraction Algorithm of Complex Document Images
title_sort	text extraction algorithm of complex document images
publishDate	2003
url	http://ndltd.ncl.edu.tw/handle/65385313010865626593
work_keys_str_mv	AT wenpinwang atextextractionalgorithmofcomplexdocumentimages AT wángwénbīn atextextractionalgorithmofcomplexdocumentimages AT wenpinwang fùzáwénjiànyǐngxiàngdewénzìchōuqǔjìshù AT wángwénbīn fùzáwénjiànyǐngxiàngdewénzìchōuqǔjìshù AT wenpinwang textextractionalgorithmofcomplexdocumentimages AT wángwénbīn textextractionalgorithmofcomplexdocumentimages
_version_	1718315646342660096

A Text Extraction Algorithm of Complex Document Images

Similar Items