The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence

Network analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into questi...

Full description

Bibliographic Details
Main Authors: Yann C. Ryan, Sebastian E. Ahnert
Format: Article
Language:English
Published: Department of Languages, Literatures, and Cultures at McGill University 2021-07-01
Series:Journal of Cultural Analytics
Online Access:https://culturalanalytics.scholasticahq.com/article/25943-the-measure-of-the-archive-the-ro-bustness-of-network-analysis-in-early-modern-correspondence.pdf
id doaj-2a4750b687ab41569a7199dbd1efa12a
record_format Article
spelling doaj-2a4750b687ab41569a7199dbd1efa12a2021-07-21T19:20:28ZengDepartment of Languages, Literatures, and Cultures at McGill UniversityJournal of Cultural Analytics2371-45492021-07-01The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern CorrespondenceYann C. RyanSebastian E. AhnertNetwork analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into question, given that such approaches require ’hard data’ as input, yet almost inevitably use datasets with partial or missing records. Other disciplines using network analysis have conducted robustness experiments designed to test the impact of data loss or error on their results. In order to test how this missing data might affect our own area of research, we conducted a number of experiments designed to simulate the impact of the kinds of loss often seen in historical correspondence data, including random document loss, missing years, and errors in the disambiguation and de-duplication process. The results show that most network centrality measures maintain robustness until a very large proportion of the data (60% or more) is removed. Some measures showed a linear change in robustness, while others remained high and then fell off sharply. Only one, transitivity (local clustering coefficient) was significantly impacted throughout. We tested a range of data loss scenarios (random single letters, folio books of manuscript letters, catalogues, and entire years) and a range of commonly used network metrics. In addition, we tested the robustness of more complex network analysis results in the literature that combine several network metrics to highlight individuals in the network, and found that the same types of individuals would have likely been highlighted even with 50% random letter loss. Alongside the article is a web application, built using Shiny, which will calculate robustness measures for a user-uploaded network dataset. We conclude that researchers working with similar historical correspondence datasets might be able to consider network analysis results to be robust in most cases, rather than work on the assumption that missing data would lead to very different findings or results.https://culturalanalytics.scholasticahq.com/article/25943-the-measure-of-the-archive-the-ro-bustness-of-network-analysis-in-early-modern-correspondence.pdf
collection DOAJ
language English
format Article
sources DOAJ
author Yann C. Ryan
Sebastian E. Ahnert
spellingShingle Yann C. Ryan
Sebastian E. Ahnert
The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
Journal of Cultural Analytics
author_facet Yann C. Ryan
Sebastian E. Ahnert
author_sort Yann C. Ryan
title The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
title_short The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
title_full The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
title_fullStr The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
title_full_unstemmed The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
title_sort measure of the archive: the ro­bustness of network analysis in early modern correspondence
publisher Department of Languages, Literatures, and Cultures at McGill University
series Journal of Cultural Analytics
issn 2371-4549
publishDate 2021-07-01
description Network analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into question, given that such approaches require ’hard data’ as input, yet almost inevitably use datasets with partial or missing records. Other disciplines using network analysis have conducted robustness experiments designed to test the impact of data loss or error on their results. In order to test how this missing data might affect our own area of research, we conducted a number of experiments designed to simulate the impact of the kinds of loss often seen in historical correspondence data, including random document loss, missing years, and errors in the disambiguation and de-duplication process. The results show that most network centrality measures maintain robustness until a very large proportion of the data (60% or more) is removed. Some measures showed a linear change in robustness, while others remained high and then fell off sharply. Only one, transitivity (local clustering coefficient) was significantly impacted throughout. We tested a range of data loss scenarios (random single letters, folio books of manuscript letters, catalogues, and entire years) and a range of commonly used network metrics. In addition, we tested the robustness of more complex network analysis results in the literature that combine several network metrics to highlight individuals in the network, and found that the same types of individuals would have likely been highlighted even with 50% random letter loss. Alongside the article is a web application, built using Shiny, which will calculate robustness measures for a user-uploaded network dataset. We conclude that researchers working with similar historical correspondence datasets might be able to consider network analysis results to be robust in most cases, rather than work on the assumption that missing data would lead to very different findings or results.
url https://culturalanalytics.scholasticahq.com/article/25943-the-measure-of-the-archive-the-ro-bustness-of-network-analysis-in-early-modern-correspondence.pdf
work_keys_str_mv AT yanncryan themeasureofthearchivetherobustnessofnetworkanalysisinearlymoderncorrespondence
AT sebastianeahnert themeasureofthearchivetherobustnessofnetworkanalysisinearlymoderncorrespondence
AT yanncryan measureofthearchivetherobustnessofnetworkanalysisinearlymoderncorrespondence
AT sebastianeahnert measureofthearchivetherobustnessofnetworkanalysisinearlymoderncorrespondence
_version_ 1721292265621553152