BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases

High-quality data are indispensable for research and management in biodiversity conservation. Nevertheless, errors in biodiversity data must be removed before they can be used with confidence. In this study, we have developed a workflow for cleaning occurrence data archived in various biodiversity d...

Full description

Bibliographic Details
Main Authors: Jing Jin, Jun Yang
Format: Article
Language:English
Published: Elsevier 2020-03-01
Series:Global Ecology and Conservation
Online Access:http://www.sciencedirect.com/science/article/pii/S235198941930633X
Description
Summary:High-quality data are indispensable for research and management in biodiversity conservation. Nevertheless, errors in biodiversity data must be removed before they can be used with confidence. In this study, we have developed a workflow for cleaning occurrence data archived in various biodiversity databases. The workflow allows researchers and practitioners to identify taxonomic and geographic errors in millions of records in an automatic, reproducible, and transparent manner. It also allows users to correct several types of taxonomic and geographic errors. We applied the workflow to clean global tree occurrence records. The results showed that among the 30,242,556 occurrence records of 58,034 species extracted from eight databases, only 8,624,319 (28.5%) records of 22,766 (39.2%) species were classified as high quality after running through the workflow. Inaccurate and non-standard taxon names appeared as a more severe problem than geographical errors that people are most familiar with. The workflow developed in this study can be easily adapted to clean occurrence records of other taxonomic groups, which allows researchers and practitioners to reduce uncertainties in their findings. Keywords: Data quality, Biodiversity, Data cleaning, Tree species, Conservation
ISSN:2351-9894