Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images

Word searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all...

Full description

Bibliographic Details
Main Authors: Malakar Samir, Ghosh Manosij, Sarkar Ram, Nasipuri Mita
Format: Article
Language:English
Published: De Gruyter 2018-07-01
Series:Journal of Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1515/jisys-2017-0384
id doaj-52a343250ac94164b35ebf0d20614615
record_format Article
spelling doaj-52a343250ac94164b35ebf0d206146152021-09-06T19:40:38ZengDe GruyterJournal of Intelligent Systems0334-18602191-026X2018-07-0129171973510.1515/jisys-2017-0384Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document ImagesMalakar Samir0Ghosh Manosij1Sarkar Ram2Nasipuri Mita3Department of Computer Science, Asutosh College, Kolkata, IndiaDepartment of Computer Science and Engineering, Jadavpur University, Kolkata, IndiaDepartment of Computer Science and Engineering, Jadavpur University, Kolkata, IndiaDepartment of Computer Science and Engineering, Jadavpur University, Kolkata, IndiaWord searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods.https://doi.org/10.1515/jisys-2017-0384word searchinghog featuretopological featureholistic word recognitionhandwritten documentsquwi database
collection DOAJ
language English
format Article
sources DOAJ
author Malakar Samir
Ghosh Manosij
Sarkar Ram
Nasipuri Mita
spellingShingle Malakar Samir
Ghosh Manosij
Sarkar Ram
Nasipuri Mita
Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
Journal of Intelligent Systems
word searching
hog feature
topological feature
holistic word recognition
handwritten documents
quwi database
author_facet Malakar Samir
Ghosh Manosij
Sarkar Ram
Nasipuri Mita
author_sort Malakar Samir
title Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
title_short Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
title_full Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
title_fullStr Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
title_full_unstemmed Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
title_sort development of a two-stage segmentation-based word searching method for handwritten document images
publisher De Gruyter
series Journal of Intelligent Systems
issn 0334-1860
2191-026X
publishDate 2018-07-01
description Word searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods.
topic word searching
hog feature
topological feature
holistic word recognition
handwritten documents
quwi database
url https://doi.org/10.1515/jisys-2017-0384
work_keys_str_mv AT malakarsamir developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages
AT ghoshmanosij developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages
AT sarkarram developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages
AT nasipurimita developmentofatwostagesegmentationbasedwordsearchingmethodforhandwrittendocumentimages
_version_ 1717768002374467584