Discrete Script or Cursive Language Identification from Document Images
We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of template...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2004-01-01
|
Series: | Journal of King Saud University: Engineering Sciences |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1018363918307906 |
id |
doaj-1947a4dc32684b4484c523664f1c5edd |
---|---|
record_format |
Article |
spelling |
doaj-1947a4dc32684b4484c523664f1c5edd2020-11-24T22:03:08ZengElsevierJournal of King Saud University: Engineering Sciences1018-36392004-01-01162253268Discrete Script or Cursive Language Identification from Document ImagesIbrahim S.I. Abuhaiba0Department of Electrical and Computer Engineering, Islamic University of Gaza, P.O. Box 1276, Gaza, PalestineWe present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. Keywords: Document understanding, Script and language identification, Normalization, Template matchinghttp://www.sciencedirect.com/science/article/pii/S1018363918307906 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ibrahim S.I. Abuhaiba |
spellingShingle |
Ibrahim S.I. Abuhaiba Discrete Script or Cursive Language Identification from Document Images Journal of King Saud University: Engineering Sciences |
author_facet |
Ibrahim S.I. Abuhaiba |
author_sort |
Ibrahim S.I. Abuhaiba |
title |
Discrete Script or Cursive Language Identification from Document Images |
title_short |
Discrete Script or Cursive Language Identification from Document Images |
title_full |
Discrete Script or Cursive Language Identification from Document Images |
title_fullStr |
Discrete Script or Cursive Language Identification from Document Images |
title_full_unstemmed |
Discrete Script or Cursive Language Identification from Document Images |
title_sort |
discrete script or cursive language identification from document images |
publisher |
Elsevier |
series |
Journal of King Saud University: Engineering Sciences |
issn |
1018-3639 |
publishDate |
2004-01-01 |
description |
We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. Keywords: Document understanding, Script and language identification, Normalization, Template matching |
url |
http://www.sciencedirect.com/science/article/pii/S1018363918307906 |
work_keys_str_mv |
AT ibrahimsiabuhaiba discretescriptorcursivelanguageidentificationfromdocumentimages |
_version_ |
1725833026167898112 |