A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine

Challenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing tex...

Full description

Bibliographic Details
Main Authors: Rapeeporn Chamchong, Chun Che Fung
Format: Article
Language:English
Published: Asia University 2015-01-01
Series:Advances in Decision Sciences
Online Access:http://dx.doi.org/10.1155/2015/925935
id doaj-7fcf811037d141ec91f4ecba2b6d21dd
record_format Article
spelling doaj-7fcf811037d141ec91f4ecba2b6d21dd2020-11-24T21:45:10ZengAsia UniversityAdvances in Decision Sciences2090-33592090-33672015-01-01201510.1155/2015/925935925935A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector MachineRapeeporn Chamchong0Chun Che Fung1School of Engineering and Information Technology, Murdoch University, Perth, WA 6150, AustraliaSchool of Engineering and Information Technology, Murdoch University, Perth, WA 6150, AustraliaChallenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing text documents have been proposed, the performance of these techniques varies and depends on the image characteristics. Therefore, selecting binarization techniques can be a key idea to achieve improved results. This paper proposes a framework for selecting binarizing techniques of palm leaf manuscripts using Support Vector Machines (SVMs). The overall process is divided into three steps: (i) feature extraction: feature patterns are extracted from grayscale images based on global intensity, local contrast, and intensity; (ii) treatment of imbalanced data: imbalanced dataset is balanced by using Synthetic Minority Oversampling Technique as to improve the performance of prediction; and (iii) selection: SVM is applied in order to select the appropriate binarization techniques. The proposed framework has been evaluated with palm leaf manuscript images and benchmarking dataset from DIBCO series and compared the performance of prediction between imbalanced and balanced datasets. Experimental results showed that the proposed framework can be used as an integral part of an automatic selection process.http://dx.doi.org/10.1155/2015/925935
collection DOAJ
language English
format Article
sources DOAJ
author Rapeeporn Chamchong
Chun Che Fung
spellingShingle Rapeeporn Chamchong
Chun Che Fung
A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
Advances in Decision Sciences
author_facet Rapeeporn Chamchong
Chun Che Fung
author_sort Rapeeporn Chamchong
title A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
title_short A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
title_full A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
title_fullStr A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
title_full_unstemmed A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
title_sort framework for the selection of binarization techniques on palm leaf manuscripts using support vector machine
publisher Asia University
series Advances in Decision Sciences
issn 2090-3359
2090-3367
publishDate 2015-01-01
description Challenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing text documents have been proposed, the performance of these techniques varies and depends on the image characteristics. Therefore, selecting binarization techniques can be a key idea to achieve improved results. This paper proposes a framework for selecting binarizing techniques of palm leaf manuscripts using Support Vector Machines (SVMs). The overall process is divided into three steps: (i) feature extraction: feature patterns are extracted from grayscale images based on global intensity, local contrast, and intensity; (ii) treatment of imbalanced data: imbalanced dataset is balanced by using Synthetic Minority Oversampling Technique as to improve the performance of prediction; and (iii) selection: SVM is applied in order to select the appropriate binarization techniques. The proposed framework has been evaluated with palm leaf manuscript images and benchmarking dataset from DIBCO series and compared the performance of prediction between imbalanced and balanced datasets. Experimental results showed that the proposed framework can be used as an integral part of an automatic selection process.
url http://dx.doi.org/10.1155/2015/925935
work_keys_str_mv AT rapeepornchamchong aframeworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine
AT chunchefung aframeworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine
AT rapeepornchamchong frameworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine
AT chunchefung frameworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine
_version_ 1725906117797609472