Cleaning OCR'd text with Regular Expressions
Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This process allows texts to be searchable on one hand and more easily parsed and mined on the other. But we’ve all noticed that the OCR for historic texts is fa...
Main Author: | Laura Turner O'Hara |
---|---|
Format: | Article |
Language: | English |
Published: |
Editorial Board of the Programming Historian
2013-05-01
|
Series: | The Programming Historian |
Subjects: | |
Online Access: | http://programminghistorian.org/lessons/cleaning-ocrd-text-with-regular-expressions |
Similar Items
-
Understanding Regular Expressions
by: Doug Knox
Published: (2013-06-01) -
Generating an Ordered Data Set from an OCR Text File
by: Jon Crump
Published: (2014-11-01) -
Text Indexing for Regular Expression Matching
by: Daniel Gibney, et al.
Published: (2021-04-01) -
Helping tools for the regular expression author for test questions in LMS Moodle
by: O. A. Sychev, et al.
Published: (2016-07-01) -
Regular Expressions with Lookahead
by: Martin Berglund, et al.
Published: (2021-04-01)