Page Layout Analysis of the Document Image Based on the Region Classification in a Decision Hierarchical Structure

The conversion of document image to its electronic version is a very important problem in the saving, searching and retrieval application in the official automation system. For this purpose, analysis of the document image is necessary. In this paper, a hierarchical classification structure based on...

Full description

Bibliographic Details
Main Author: Hossein Pourghassem
Format: Article
Language:English
Published: Najafabad Branch, Islamic Azad University 2010-10-01
Series:Journal of Intelligent Procedures in Electrical Technology
Subjects:
Online Access:http://jipet.iaun.ac.ir/pdf_4465_e43d4674ce542a6b53ac42bebb470949.html
Description
Summary:The conversion of document image to its electronic version is a very important problem in the saving, searching and retrieval application in the official automation system. For this purpose, analysis of the document image is necessary. In this paper, a hierarchical classification structure based on a two-stage segmentation algorithm is proposed. In this structure, image is segmented using the proposed two-stage segmentation algorithm. Then, the type of the image regions such as document and non-document image is determined using multiple classifiers in the hierarchical classification structure. The proposed segmentation algorithm uses two algorithms based on wavelet transform and thresholding. Texture features such as correlation, homogeneity and entropy that extracted from co-occurrenc matrix and also two new features based on wavelet transform are used to classifiy and lable the regions of the image. The hierarchical classifier is consisted of two Multilayer Perceptron (MLP) classifiers and a Support Vector Machine (SVM) classifier. The proposed algorithm is evaluated on a database consisting of document and non-document images that provides from Internet. The experimental results show the efficiency of the proposed approach in the region segmentation and classification. The proposed algorithm provides accuracy rate of 97.5% on classification of the regions.
ISSN:2322-3871
2345-5594