Wide-Open: Accelerating public data release by automating detection of overdue datasets.

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this pr...

Full description

Bibliographic Details
Main Authors: Maxim Grechkin, Hoifung Poon, Bill Howe
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-06-01
Series:PLoS Biology
Online Access:http://europepmc.org/articles/PMC5464523?pdf=render
id doaj-f57372158e1f4b89a7598c02f7732077
record_format Article
spelling doaj-f57372158e1f4b89a7598c02f77320772021-07-02T06:09:56ZengPublic Library of Science (PLoS)PLoS Biology1544-91731545-78852017-06-01156e200247710.1371/journal.pbio.2002477Wide-Open: Accelerating public data release by automating detection of overdue datasets.Maxim GrechkinHoifung PoonBill HoweOpen data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.http://europepmc.org/articles/PMC5464523?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Maxim Grechkin
Hoifung Poon
Bill Howe
spellingShingle Maxim Grechkin
Hoifung Poon
Bill Howe
Wide-Open: Accelerating public data release by automating detection of overdue datasets.
PLoS Biology
author_facet Maxim Grechkin
Hoifung Poon
Bill Howe
author_sort Maxim Grechkin
title Wide-Open: Accelerating public data release by automating detection of overdue datasets.
title_short Wide-Open: Accelerating public data release by automating detection of overdue datasets.
title_full Wide-Open: Accelerating public data release by automating detection of overdue datasets.
title_fullStr Wide-Open: Accelerating public data release by automating detection of overdue datasets.
title_full_unstemmed Wide-Open: Accelerating public data release by automating detection of overdue datasets.
title_sort wide-open: accelerating public data release by automating detection of overdue datasets.
publisher Public Library of Science (PLoS)
series PLoS Biology
issn 1544-9173
1545-7885
publishDate 2017-06-01
description Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
url http://europepmc.org/articles/PMC5464523?pdf=render
work_keys_str_mv AT maximgrechkin wideopenacceleratingpublicdatareleasebyautomatingdetectionofoverduedatasets
AT hoifungpoon wideopenacceleratingpublicdatareleasebyautomatingdetectionofoverduedatasets
AT billhowe wideopenacceleratingpublicdatareleasebyautomatingdetectionofoverduedatasets
_version_ 1721337644724518912