A Text Extraction Algorithm of Complex Document Images

碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in docu...

Full description

Bibliographic Details
Main Authors: Wen-Pin Wang, 王文賓
Other Authors: Ben-Fei Wu
Format: Others
Language:zh-TW
Published: 2003
Online Access:http://ndltd.ncl.edu.tw/handle/65385313010865626593
Description
Summary:碩士 === 國立交通大學 === 電資學院學程碩士班 === 91 === The text extraction techique widely apply to document image. The complexity of background images is critical to the application of the text extraction techique. Extracting the texts from a complex compound document image is an important issue in document analysis. The local histogram distribution of document image reveals many features. Those features are very suitable for document image analysis. This thesis presents a good text extraction algorithm, to extract texts from different compound document images based on the features of local histogram distribution, the size of texts, and the direction of text string. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. The text extraction algorithm can separate texts from grayscale or true-color document images, regardless of whether the texts overlay a simple, slowly or highly varying background. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background.