Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century
The paper discusses Optical Character Recognition (OCR) of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries), with different usage of the Cyrillic alphabet in Romanian and, corre...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Institute of Mathematics and Computer Science of the Academy of Sciences of Moldova
2016-04-01
|
Series: | Computer Science Journal of Moldova |
Online Access: | http://www.math.md/files/csjm/v24-n1/v24-n1-(pp106-117).pdf |
id |
doaj-766dd333240d417aa59d0d0e6ac07368 |
---|---|
record_format |
Article |
spelling |
doaj-766dd333240d417aa59d0d0e6ac073682020-11-24T22:39:16ZengInstitute of Mathematics and Computer Science of the Academy of Sciences of MoldovaComputer Science Journal of Moldova1561-40422016-04-01241(70)106117Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th CenturySvetlana Cojocaru0Alexandru Colesnicov1Ludmila Malahov2Tudor Bumbu3Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaInstitute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaInstitute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaInstitute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academiei str. 5, MD-2028 Chisinau, MoldovaThe paper discusses Optical Character Recognition (OCR) of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries), with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR. We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed. The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After transliteration to the modern Latin script we obtain no-barrier access to historical documents.http://www.math.md/files/csjm/v24-n1/v24-n1-(pp106-117).pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Svetlana Cojocaru Alexandru Colesnicov Ludmila Malahov Tudor Bumbu |
spellingShingle |
Svetlana Cojocaru Alexandru Colesnicov Ludmila Malahov Tudor Bumbu Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century Computer Science Journal of Moldova |
author_facet |
Svetlana Cojocaru Alexandru Colesnicov Ludmila Malahov Tudor Bumbu |
author_sort |
Svetlana Cojocaru |
title |
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century |
title_short |
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century |
title_full |
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century |
title_fullStr |
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century |
title_full_unstemmed |
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century |
title_sort |
optical character recognition applied to romanian printed texts of the 18th–20th century |
publisher |
Institute of Mathematics and Computer Science of the Academy of Sciences of Moldova |
series |
Computer Science Journal of Moldova |
issn |
1561-4042 |
publishDate |
2016-04-01 |
description |
The paper discusses Optical Character Recognition (OCR) of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script.
We differ three epochs (approximately, the 18th, 19th, and 20th centuries), with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR.
We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed.
The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After
transliteration to the modern Latin script we obtain no-barrier access to historical documents. |
url |
http://www.math.md/files/csjm/v24-n1/v24-n1-(pp106-117).pdf |
work_keys_str_mv |
AT svetlanacojocaru opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury AT alexandrucolesnicov opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury AT ludmilamalahov opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury AT tudorbumbu opticalcharacterrecognitionappliedtoromanianprintedtextsofthe18th20thcentury |
_version_ |
1725709775560245248 |