Data Leakage and Loss in Biodiversity Informatics

The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these exis...

Full description

Bibliographic Details
Main Authors: A. Townsend Peterson, Alex Asase, Dora Canhos, Sidnei de Souza, John Wieczorek
Format: Article
Language:English
Published: Pensoft Publishers 2018-11-01
Series:Biodiversity Data Journal
Subjects:
Online Access:https://bdj.pensoft.net/articles.php?id=26826
id doaj-4d78603cda3345c6b93296a31131430d
record_format Article
spelling doaj-4d78603cda3345c6b93296a31131430d2020-11-25T02:16:13ZengPensoft PublishersBiodiversity Data Journal1314-28361314-28282018-11-01611510.3897/BDJ.6.e2682626826Data Leakage and Loss in Biodiversity InformaticsA. Townsend Peterson0Alex Asase1Dora Canhos2Sidnei de Souza3John Wieczorek4Biodiversity Institute, University of KansasUniversity of GhanaCRIACRIAMuseum of Vertebrate Zoology, University of California The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge. https://bdj.pensoft.net/articles.php?id=26826biodiversity datausabilityfitness for use
collection DOAJ
language English
format Article
sources DOAJ
author A. Townsend Peterson
Alex Asase
Dora Canhos
Sidnei de Souza
John Wieczorek
spellingShingle A. Townsend Peterson
Alex Asase
Dora Canhos
Sidnei de Souza
John Wieczorek
Data Leakage and Loss in Biodiversity Informatics
Biodiversity Data Journal
biodiversity data
usability
fitness for use
author_facet A. Townsend Peterson
Alex Asase
Dora Canhos
Sidnei de Souza
John Wieczorek
author_sort A. Townsend Peterson
title Data Leakage and Loss in Biodiversity Informatics
title_short Data Leakage and Loss in Biodiversity Informatics
title_full Data Leakage and Loss in Biodiversity Informatics
title_fullStr Data Leakage and Loss in Biodiversity Informatics
title_full_unstemmed Data Leakage and Loss in Biodiversity Informatics
title_sort data leakage and loss in biodiversity informatics
publisher Pensoft Publishers
series Biodiversity Data Journal
issn 1314-2836
1314-2828
publishDate 2018-11-01
description The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.
topic biodiversity data
usability
fitness for use
url https://bdj.pensoft.net/articles.php?id=26826
work_keys_str_mv AT atownsendpeterson dataleakageandlossinbiodiversityinformatics
AT alexasase dataleakageandlossinbiodiversityinformatics
AT doracanhos dataleakageandlossinbiodiversityinformatics
AT sidneidesouza dataleakageandlossinbiodiversityinformatics
AT johnwieczorek dataleakageandlossinbiodiversityinformatics
_version_ 1724891958091972608