Generating an Ordered Data Set from an OCR Text File

This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it. These illustrations are specific to a particular text, but the overall strategy, and...

Full description

Bibliographic Details
Main Author: Jon Crump
Format: Article
Language:English
Published: Editorial Board of the Programming Historian 2014-11-01
Series:The Programming Historian
Subjects:
OCR
Online Access:http://programminghistorian.org/lessons/generating-an-ordered-data-set-from-an-OCR-text-file