The health care and life sciences community profile for dataset descriptions

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of d...

Full description

Bibliographic Details
Main Authors: Michel Dumontier, Alasdair J.G. Gray, M. Scott Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, Pascale Gaudet, Erich A. Gombocz, Alejandra N. Gonzalez-Beltran, Paul Groth, Melissa Haendel, Maori Ito, Simon Jupp, Nick Juty, Toshiaki Katayama, Norio Kobayashi, Kalpana Krishnaswami, Camille Laibe, Nicolas Le Novère, Simon Lin, James Malone, Michael Miller, Christopher J. Mungall, Laurens Rietveld, Sarala M. Wimalaratne, Atsuko Yamaguchi
Format: Article
Language:English
Published: PeerJ Inc. 2016-08-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/2331.pdf
id doaj-87a6908c1991464f8c104ef7ffc724a9
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Michel Dumontier
Alasdair J.G. Gray
M. Scott Marshall
Vladimir Alexiev
Peter Ansell
Gary Bader
Joachim Baran
Jerven T. Bolleman
Alison Callahan
José Cruz-Toledo
Pascale Gaudet
Erich A. Gombocz
Alejandra N. Gonzalez-Beltran
Paul Groth
Melissa Haendel
Maori Ito
Simon Jupp
Nick Juty
Toshiaki Katayama
Norio Kobayashi
Kalpana Krishnaswami
Camille Laibe
Nicolas Le Novère
Simon Lin
James Malone
Michael Miller
Christopher J. Mungall
Laurens Rietveld
Sarala M. Wimalaratne
Atsuko Yamaguchi
spellingShingle Michel Dumontier
Alasdair J.G. Gray
M. Scott Marshall
Vladimir Alexiev
Peter Ansell
Gary Bader
Joachim Baran
Jerven T. Bolleman
Alison Callahan
José Cruz-Toledo
Pascale Gaudet
Erich A. Gombocz
Alejandra N. Gonzalez-Beltran
Paul Groth
Melissa Haendel
Maori Ito
Simon Jupp
Nick Juty
Toshiaki Katayama
Norio Kobayashi
Kalpana Krishnaswami
Camille Laibe
Nicolas Le Novère
Simon Lin
James Malone
Michael Miller
Christopher J. Mungall
Laurens Rietveld
Sarala M. Wimalaratne
Atsuko Yamaguchi
The health care and life sciences community profile for dataset descriptions
PeerJ
Data profiling
Dataset descriptions
Metadata
Provenance
FAIR data
author_facet Michel Dumontier
Alasdair J.G. Gray
M. Scott Marshall
Vladimir Alexiev
Peter Ansell
Gary Bader
Joachim Baran
Jerven T. Bolleman
Alison Callahan
José Cruz-Toledo
Pascale Gaudet
Erich A. Gombocz
Alejandra N. Gonzalez-Beltran
Paul Groth
Melissa Haendel
Maori Ito
Simon Jupp
Nick Juty
Toshiaki Katayama
Norio Kobayashi
Kalpana Krishnaswami
Camille Laibe
Nicolas Le Novère
Simon Lin
James Malone
Michael Miller
Christopher J. Mungall
Laurens Rietveld
Sarala M. Wimalaratne
Atsuko Yamaguchi
author_sort Michel Dumontier
title The health care and life sciences community profile for dataset descriptions
title_short The health care and life sciences community profile for dataset descriptions
title_full The health care and life sciences community profile for dataset descriptions
title_fullStr The health care and life sciences community profile for dataset descriptions
title_full_unstemmed The health care and life sciences community profile for dataset descriptions
title_sort health care and life sciences community profile for dataset descriptions
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2016-08-01
description Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
topic Data profiling
Dataset descriptions
Metadata
Provenance
FAIR data
url https://peerj.com/articles/2331.pdf
work_keys_str_mv AT micheldumontier thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT alasdairjggray thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT mscottmarshall thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT vladimiralexiev thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT peteransell thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT garybader thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT joachimbaran thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT jerventbolleman thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT alisoncallahan thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT josecruztoledo thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT pascalegaudet thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT erichagombocz thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT alejandrangonzalezbeltran thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT paulgroth thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT melissahaendel thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT maoriito thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT simonjupp thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT nickjuty thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT toshiakikatayama thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT noriokobayashi thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT kalpanakrishnaswami thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT camillelaibe thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT nicolaslenovere thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT simonlin thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT jamesmalone thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT michaelmiller thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT christopherjmungall thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT laurensrietveld thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT saralamwimalaratne thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT atsukoyamaguchi thehealthcareandlifesciencescommunityprofilefordatasetdescriptions
AT micheldumontier healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT alasdairjggray healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT mscottmarshall healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT vladimiralexiev healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT peteransell healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT garybader healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT joachimbaran healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT jerventbolleman healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT alisoncallahan healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT josecruztoledo healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT pascalegaudet healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT erichagombocz healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT alejandrangonzalezbeltran healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT paulgroth healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT melissahaendel healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT maoriito healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT simonjupp healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT nickjuty healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT toshiakikatayama healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT noriokobayashi healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT kalpanakrishnaswami healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT camillelaibe healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT nicolaslenovere healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT simonlin healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT jamesmalone healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT michaelmiller healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT christopherjmungall healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT laurensrietveld healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT saralamwimalaratne healthcareandlifesciencescommunityprofilefordatasetdescriptions
AT atsukoyamaguchi healthcareandlifesciencescommunityprofilefordatasetdescriptions
_version_ 1725904189038526464
spelling doaj-87a6908c1991464f8c104ef7ffc724a92020-11-24T21:45:47ZengPeerJ Inc.PeerJ2167-83592016-08-014e233110.7717/peerj.2331The health care and life sciences community profile for dataset descriptionsMichel Dumontier0Alasdair J.G. Gray1M. Scott Marshall2Vladimir Alexiev3Peter Ansell4Gary Bader5Joachim Baran6Jerven T. Bolleman7Alison Callahan8José Cruz-Toledo9Pascale Gaudet10Erich A. Gombocz11Alejandra N. Gonzalez-Beltran12Paul Groth13Melissa Haendel14Maori Ito15Simon Jupp16Nick Juty17Toshiaki Katayama18Norio Kobayashi19Kalpana Krishnaswami20Camille Laibe21Nicolas Le Novère22Simon Lin23James Malone24Michael Miller25Christopher J. Mungall26Laurens Rietveld27Sarala M. Wimalaratne28Atsuko Yamaguchi29Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of AmericaDepartment of Computer Science, Heriot-Watt University, Edinburgh, United KingdomDepartment of Radiation Oncology (MAASTRO), GROW— School for Oncology and Developmental Biology, MAASTRO Clinic, Maastricht, NetherlandsOntotext Corporation, Sofia, BulgariaCSIRO, AustraliaThe Donnelly Centre, University of Toronto, Toronto, CanadaStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of AmericaSwiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneve, SwitzerlandStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of AmericaCarleton University, CanadaCALIPHO group, SIB Swiss Institute of Bioinformatics, Geneve, SwitzerlandIO Informatics, Berkeley, CA, United States of AmericaOxford e-Research Centre, University of Oxford, Oxford, Oxfordshire, United KingdomElsevier Labs, NetherlandsDepartment of Medical Informatics and Epidemiology, Oregon Health Sciences University, Portland, OR, United States of AmericaOffice of Medical Informatics and Epidemiology, Pharmaceuticals and Medical Devices Agency, Chiyoda-ku, JapanEMBL, European Bioinformatics Institute, Saffron Walden, United KingdomEMBL, European Bioinformatics Institute, Saffron Walden, United KingdomDatabase Center for Life Science, Kashiwa, JapanAdvanced Center for Computing and Communication, RIKEN, Wako-shi, Saitama, JapanCerenode Inc., United States of AmericaEMBL, European Bioinformatics Institute, Saffron Walden, United KingdomThe Babraham Institute, Cambridge, United KingdomNationwide Children’s Hospital, Columbus, OH, United States of AmericaEMBL, European Bioinformatics Institute, Saffron Walden, United KingdomInstitute for Systems Biology, Seattle, WA, United States of AmericaEnvironmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of AmericaDepartment of Exact Sciences, VU University Amsterdam, Amsterdam, NetherlandsEMBL, European Bioinformatics Institute, Saffron Walden, United KingdomDatabase Center for Life Science, Kashiwa, JapanAccess to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.https://peerj.com/articles/2331.pdfData profilingDataset descriptionsMetadataProvenanceFAIR data