The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, th...

Full description

Bibliographic Details
Main Authors: Julia Damerow, B. R. Erick Peirson, Manfred D. Laubichler
Format: Article
Language:English
Published: Ubiquity Press 2017-09-01
Series:Journal of Open Research Software
Subjects:
OCR
Online Access:https://openresearchsoftware.metajnl.com/articles/164

Similar Items