Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.

The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segment...

Full description

Bibliographic Details
Main Author: AlKhateeb, Jawad H.Y.
Other Authors: Jiang, Jianmin
Language:en
Published: University of Bradford 2010
Subjects:
Online Access:http://hdl.handle.net/10454/4440
id ndltd-BRADFORD-oai-bradscholars.brad.ac.uk-10454-4440
record_format oai_dc
spelling ndltd-BRADFORD-oai-bradscholars.brad.ac.uk-10454-44402019-09-21T03:01:53Z Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches. AlKhateeb, Jawad H.Y. Jiang, Jianmin Ipson, Stanley S. Arabic text Pattern recognition Word recognition Handwritten documents Text classification and recognition Feature extraction The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text. The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using ii the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved. 2010-10-01T15:23:05Z 2010-10-01T15:23:05Z 2010-10-01T15:23:05Z 2010 Thesis doctoral PhD http://hdl.handle.net/10454/4440 en <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>. University of Bradford Department of Electronic Imaging and Media Communications
collection NDLTD
language en
sources NDLTD
topic Arabic text
Pattern recognition
Word recognition
Handwritten documents
Text classification and recognition
Feature extraction
spellingShingle Arabic text
Pattern recognition
Word recognition
Handwritten documents
Text classification and recognition
Feature extraction
AlKhateeb, Jawad H.Y.
Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
description The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text. The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using ii the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved.
author2 Jiang, Jianmin
author_facet Jiang, Jianmin
AlKhateeb, Jawad H.Y.
author AlKhateeb, Jawad H.Y.
author_sort AlKhateeb, Jawad H.Y.
title Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
title_short Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
title_full Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
title_fullStr Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
title_full_unstemmed Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
title_sort word based off-line handwritten arabic classification and recognition. design of automatic recognition system for large vocabulary offline handwritten arabic words using machine learning approaches.
publisher University of Bradford
publishDate 2010
url http://hdl.handle.net/10454/4440
work_keys_str_mv AT alkhateebjawadhy wordbasedofflinehandwrittenarabicclassificationandrecognitiondesignofautomaticrecognitionsystemforlargevocabularyofflinehandwrittenarabicwordsusingmachinelearningapproaches
_version_ 1719252509781917696