Discrete Script or Cursive Language Identification from Document Images

We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of template...

Full description

Bibliographic Details
Main Author: Ibrahim S.I. Abuhaiba
Format: Article
Language:English
Published: Elsevier 2004-01-01
Series:Journal of King Saud University: Engineering Sciences
Online Access:http://www.sciencedirect.com/science/article/pii/S1018363918307906
id doaj-1947a4dc32684b4484c523664f1c5edd
record_format Article
spelling doaj-1947a4dc32684b4484c523664f1c5edd2020-11-24T22:03:08ZengElsevierJournal of King Saud University: Engineering Sciences1018-36392004-01-01162253268Discrete Script or Cursive Language Identification from Document ImagesIbrahim S.I. Abuhaiba0Department of Electrical and Computer Engineering, Islamic University of Gaza, P.O. Box 1276, Gaza, PalestineWe present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. Keywords: Document understanding, Script and language identification, Normalization, Template matchinghttp://www.sciencedirect.com/science/article/pii/S1018363918307906
collection DOAJ
language English
format Article
sources DOAJ
author Ibrahim S.I. Abuhaiba
spellingShingle Ibrahim S.I. Abuhaiba
Discrete Script or Cursive Language Identification from Document Images
Journal of King Saud University: Engineering Sciences
author_facet Ibrahim S.I. Abuhaiba
author_sort Ibrahim S.I. Abuhaiba
title Discrete Script or Cursive Language Identification from Document Images
title_short Discrete Script or Cursive Language Identification from Document Images
title_full Discrete Script or Cursive Language Identification from Document Images
title_fullStr Discrete Script or Cursive Language Identification from Document Images
title_full_unstemmed Discrete Script or Cursive Language Identification from Document Images
title_sort discrete script or cursive language identification from document images
publisher Elsevier
series Journal of King Saud University: Engineering Sciences
issn 1018-3639
publishDate 2004-01-01
description We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. Keywords: Document understanding, Script and language identification, Normalization, Template matching
url http://www.sciencedirect.com/science/article/pii/S1018363918307906
work_keys_str_mv AT ibrahimsiabuhaiba discretescriptorcursivelanguageidentificationfromdocumentimages
_version_ 1725833026167898112