Kurdish Optical Character Recognition

Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it...

Full description

Bibliographic Details
Main Authors: Rasty Yaseen, Hossein Hassani
Format: Article
Language:English
Published: Univeristy of Kurdistan Hewler 2018-06-01
Series:UKH Journal of Science and Engineering
Subjects:
Online Access:https://journals.ukh.edu.krd/index.php/ukhjse/article/view/38
Description
Summary:Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average.
ISSN:2520-7792