Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
Word searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2018-07-01
|
Series: | Journal of Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1515/jisys-2017-0384 |
id |
doaj-52a343250ac94164b35ebf0d20614615 |
---|---|
record_format |
Article |
spelling |
doaj-52a343250ac94164b35ebf0d206146152021-09-06T19:40:38ZengDe GruyterJournal of Intelligent Systems0334-18602191-026X2018-07-0129171973510.1515/jisys-2017-0384Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document ImagesMalakar Samir0Ghosh Manosij1Sarkar Ram2Nasipuri Mita3Department of Computer Science, Asutosh College, Kolkata, IndiaDepartment of Computer Science and Engineering, Jadavpur University, Kolkata, IndiaDepartment of Computer Science and Engineering, Jadavpur University, Kolkata, IndiaDepartment of Computer Science and Engineering, Jadavpur University, Kolkata, IndiaWord searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods.https://doi.org/10.1515/jisys-2017-0384word searchinghog featuretopological featureholistic word recognitionhandwritten documentsquwi database |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Malakar Samir Ghosh Manosij Sarkar Ram Nasipuri Mita |
spellingShingle |
Malakar Samir Ghosh Manosij Sarkar Ram Nasipuri Mita Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images Journal of Intelligent Systems word searching hog feature topological feature holistic word recognition handwritten documents quwi database |
author_facet |
Malakar Samir Ghosh Manosij Sarkar Ram Nasipuri Mita |
author_sort |
Malakar Samir |
title |
Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images |
title_short |
Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images |
title_full |
Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images |
title_fullStr |
Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images |
title_full_unstemmed |
Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images |
title_sort |
development of a two-stage segmentation-based word searching method for handwritten document images |
publisher |
De Gruyter |
series |
Journal of Intelligent Systems |
issn |
0334-1860 2191-026X |
publishDate |
2018-07-01 |
description |
Word searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods. |
topic |
word searching hog feature topological feature holistic word recognition handwritten documents quwi database |
url |
https://doi.org/10.1515/jisys-2017-0384 |
work_keys_str_mv |
AT malakarsamir developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages AT ghoshmanosij developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages AT sarkarram developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages AT nasipurimita developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages |
_version_ |
1717768002374467584 |