A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine
Challenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing tex...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Asia University
2015-01-01
|
Series: | Advances in Decision Sciences |
Online Access: | http://dx.doi.org/10.1155/2015/925935 |
id |
doaj-7fcf811037d141ec91f4ecba2b6d21dd |
---|---|
record_format |
Article |
spelling |
doaj-7fcf811037d141ec91f4ecba2b6d21dd2020-11-24T21:45:10ZengAsia UniversityAdvances in Decision Sciences2090-33592090-33672015-01-01201510.1155/2015/925935925935A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector MachineRapeeporn Chamchong0Chun Che Fung1School of Engineering and Information Technology, Murdoch University, Perth, WA 6150, AustraliaSchool of Engineering and Information Technology, Murdoch University, Perth, WA 6150, AustraliaChallenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing text documents have been proposed, the performance of these techniques varies and depends on the image characteristics. Therefore, selecting binarization techniques can be a key idea to achieve improved results. This paper proposes a framework for selecting binarizing techniques of palm leaf manuscripts using Support Vector Machines (SVMs). The overall process is divided into three steps: (i) feature extraction: feature patterns are extracted from grayscale images based on global intensity, local contrast, and intensity; (ii) treatment of imbalanced data: imbalanced dataset is balanced by using Synthetic Minority Oversampling Technique as to improve the performance of prediction; and (iii) selection: SVM is applied in order to select the appropriate binarization techniques. The proposed framework has been evaluated with palm leaf manuscript images and benchmarking dataset from DIBCO series and compared the performance of prediction between imbalanced and balanced datasets. Experimental results showed that the proposed framework can be used as an integral part of an automatic selection process.http://dx.doi.org/10.1155/2015/925935 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Rapeeporn Chamchong Chun Che Fung |
spellingShingle |
Rapeeporn Chamchong Chun Che Fung A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine Advances in Decision Sciences |
author_facet |
Rapeeporn Chamchong Chun Che Fung |
author_sort |
Rapeeporn Chamchong |
title |
A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine |
title_short |
A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine |
title_full |
A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine |
title_fullStr |
A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine |
title_full_unstemmed |
A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine |
title_sort |
framework for the selection of binarization techniques on palm leaf manuscripts using support vector machine |
publisher |
Asia University |
series |
Advances in Decision Sciences |
issn |
2090-3359 2090-3367 |
publishDate |
2015-01-01 |
description |
Challenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing text documents have been proposed, the performance of these techniques varies and depends on the image characteristics. Therefore, selecting binarization techniques can be a key idea to achieve improved results. This paper proposes a framework for selecting binarizing techniques of palm leaf manuscripts using Support Vector Machines (SVMs). The overall process is divided into three steps: (i) feature extraction: feature patterns are extracted from grayscale images based on global intensity, local contrast, and intensity; (ii) treatment of imbalanced data: imbalanced dataset is balanced by using Synthetic Minority Oversampling Technique as to improve the performance of prediction; and (iii) selection: SVM is applied in order to select the appropriate binarization techniques. The proposed framework has been evaluated with palm leaf manuscript images and benchmarking dataset from DIBCO series and compared the performance of prediction between imbalanced and balanced datasets. Experimental results showed that the proposed framework can be used as an integral part of an automatic selection process. |
url |
http://dx.doi.org/10.1155/2015/925935 |
work_keys_str_mv |
AT rapeepornchamchong aframeworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine AT chunchefung aframeworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine AT rapeepornchamchong frameworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine AT chunchefung frameworkfortheselectionofbinarizationtechniquesonpalmleafmanuscriptsusingsupportvectormachine |
_version_ |
1725906117797609472 |