Discrete Script or Cursive Language Identification from Document Images

We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of template...

Full description

Bibliographic Details
Main Author:	Ibrahim S.I. Abuhaiba
Format:	Article
Language:	English
Published:	Elsevier 2004-01-01
Series:	Journal of King Saud University: Engineering Sciences
Online Access:	http://www.sciencedirect.com/science/article/pii/S1018363918307906

id	doaj-1947a4dc32684b4484c523664f1c5edd
record_format	Article
spelling	doaj-1947a4dc32684b4484c523664f1c5edd2020-11-24T22:03:08ZengElsevierJournal of King Saud University: Engineering Sciences1018-36392004-01-01162253268Discrete Script or Cursive Language Identification from Document ImagesIbrahim S.I. Abuhaiba0Department of Electrical and Computer Engineering, Islamic University of Gaza, P.O. Box 1276, Gaza, PalestineWe present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. Keywords: Document understanding, Script and language identification, Normalization, Template matchinghttp://www.sciencedirect.com/science/article/pii/S1018363918307906
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ibrahim S.I. Abuhaiba
spellingShingle	Ibrahim S.I. Abuhaiba Discrete Script or Cursive Language Identification from Document Images Journal of King Saud University: Engineering Sciences
author_facet	Ibrahim S.I. Abuhaiba
author_sort	Ibrahim S.I. Abuhaiba
title	Discrete Script or Cursive Language Identification from Document Images
title_short	Discrete Script or Cursive Language Identification from Document Images
title_full	Discrete Script or Cursive Language Identification from Document Images
title_fullStr	Discrete Script or Cursive Language Identification from Document Images
title_full_unstemmed	Discrete Script or Cursive Language Identification from Document Images
title_sort	discrete script or cursive language identification from document images
publisher	Elsevier
series	Journal of King Saud University: Engineering Sciences
issn	1018-3639
publishDate	2004-01-01
description	We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. Keywords: Document understanding, Script and language identification, Normalization, Template matching
url	http://www.sciencedirect.com/science/article/pii/S1018363918307906
work_keys_str_mv	AT ibrahimsiabuhaiba discretescriptorcursivelanguageidentificationfromdocumentimages
_version_	1725833026167898112

Discrete Script or Cursive Language Identification from Document Images

Similar Items