Machine learning, template matching, and the International Tracing Service digital archive: Automating the retrieval of death certificate reference cards from 40 million document scans

Scattered throughout the International Tracing Service (ITS) digital archive, one of the largest and most heterogeneous collections of Holocaust-related material, are hundreds of thousands of reference cards to official death certificates recording a fraction of individuals who perished within conce...

Full description

Bibliographic Details
Main Author: Lee, B.C.G (Author)
Format: Article
Language:English
Published: Oxford University Press 2019
Online Access:View Fulltext in Publisher
Description
Summary:Scattered throughout the International Tracing Service (ITS) digital archive, one of the largest and most heterogeneous collections of Holocaust-related material, are hundreds of thousands of reference cards to official death certificates recording a fraction of individuals who perished within concentration camps. These cards represent the most comprehensive collection of digital material pertaining to these death certificates issued by Sonderstandesamt Arolsen, a German civil registry office. However, the reference cards can only be found dispersed throughout the Central Name Index (CNI), ITS's 46+ million-card finding aid that is indexed only by name. Consequently, aggregating the death certificate reference cards for research requires an intractable manual search. I adopt template matching and machine learning to automate the retrieval of these cards from the ITS digital archive. I demonstrate the efficacy of my method on a test set of 22,117 hand-classified cards, reporting 100% precision and 100% recall. Running this algorithm on 39,967,358 scans of cards from the CNI, I identify 312,183 death certificate reference cards in 13.75 days of elapsed real runtime on a personal computer with only a single,
ISBN:20557671 (ISSN)
DOI:10.1093/llc/fqy063