Character Segmentation in Chinese Magazines with Mixed Alphabets, Numerals and Figures

碩士 === 國立交通大學 === 資訊工程系 === 87 === A general document processing system usually includes two major modules: character segmentation module and character recognition module. In this thesis, we present an automatic system to segment characters efficiently. Our character segmentation system c...

Full description

Bibliographic Details
Main Authors: Shau-Yu Cheng, 鄭紹余
Other Authors: Hsi-Jian Lee
Format: Others
Language:en_US
Published: 1999
Online Access:http://ndltd.ncl.edu.tw/handle/89403707413783200297
Description
Summary:碩士 === 國立交通大學 === 資訊工程系 === 87 === A general document processing system usually includes two major modules: character segmentation module and character recognition module. In this thesis, we present an automatic system to segment characters efficiently. Our character segmentation system contains two modules: document layout analysis and character segmentation. In the document layout analysis module, we first perform image reduction and connected-components extraction. In the component classification procedure, the connected-components be classified as image components or text components. In the block segmentation procedure, we merge all text components into text blocks . The extraction of text components from image components can group all text components into text blocks. Finally, we perform text line segmentation to segment all text lines in the text blocks. After all text lines have been segmented, we found and extracted the initial caps if they exist in the text blocks. Finally we segment the Chinese characters, English letters and numerals in the character segmentation module. In our experiment, the character segmentation rate of our system is about 98.9% and the processing time is about 5 seconds per page with 1158 characters. This proves the effectiveness of our proposed system.