Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century

The paper discusses Optical Character Recognition (OCR) of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries), with different usage of the Cyrillic alphabet in Romanian and, corre...

Full description

Bibliographic Details
Main Authors: Svetlana Cojocaru, Alexandru Colesnicov, Ludmila Malahov, Tudor Bumbu
Format: Article
Language:English
Published: Institute of Mathematics and Computer Science of the Academy of Sciences of Moldova 2016-04-01
Series:Computer Science Journal of Moldova
Online Access:http://www.math.md/files/csjm/v24-n1/v24-n1-(pp106-117).pdf
id doaj-766dd333240d417aa59d0d0e6ac07368
record_format Article
spelling doaj-766dd333240d417aa59d0d0e6ac073682020-11-24T22:39:16ZengInstitute of Mathematics and Computer Science of the Academy of Sciences of MoldovaComputer Science Journal of Moldova1561-40422016-04-01241(70)106117Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th CenturySvetlana Cojocaru0Alexandru Colesnicov1Ludmila Malahov2Tudor Bumbu3Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaInstitute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaInstitute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaInstitute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaThe paper discusses Optical Character Recognition (OCR) of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries), with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR. We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed. The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After transliteration to the modern Latin script we obtain no-barrier access to historical documents.http://www.math.md/files/csjm/v24-n1/v24-n1-(pp106-117).pdf
collection DOAJ
language English
format Article
sources DOAJ
author Svetlana Cojocaru
Alexandru Colesnicov
Ludmila Malahov
Tudor Bumbu
spellingShingle Svetlana Cojocaru
Alexandru Colesnicov
Ludmila Malahov
Tudor Bumbu
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
Computer Science Journal of Moldova
author_facet Svetlana Cojocaru
Alexandru Colesnicov
Ludmila Malahov
Tudor Bumbu
author_sort Svetlana Cojocaru
title Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
title_short Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
title_full Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
title_fullStr Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
title_full_unstemmed Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
title_sort optical character recognition applied to romanian printed texts of the 18th–20th century
publisher Institute of Mathematics and Computer Science of the Academy of Sciences of Moldova
series Computer Science Journal of Moldova
issn 1561-4042
publishDate 2016-04-01
description The paper discusses Optical Character Recognition (OCR) of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries), with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR. We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed. The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After transliteration to the modern Latin script we obtain no-barrier access to historical documents.
url http://www.math.md/files/csjm/v24-n1/v24-n1-(pp106-117).pdf
work_keys_str_mv AT svetlanacojocaru opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury
AT alexandrucolesnicov opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury
AT ludmilamalahov opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury
AT tudorbumbu opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury
_version_ 1725709775560245248