A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses

Abstract Background Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need...

Full description

Bibliographic Details
Main Authors: Andra Waagmeester, Egon L. Willighagen, Andrew I. Su, Martina Kutmon, Jose Emilio Labra Gayo, Daniel Fernández-Álvarez, Quentin Groom, Peter J. Schaap, Lisa M. Verhagen, Jasper J. Koehorst
Format: Article
Language:English
Published: BMC 2021-01-01
Series:BMC Biology
Subjects:
Online Access:https://doi.org/10.1186/s12915-020-00940-y
id doaj-3bffc3df9828436bb991c4ad57f7be4c
record_format Article
spelling doaj-3bffc3df9828436bb991c4ad57f7be4c2021-01-24T12:26:28ZengBMCBMC Biology1741-70072021-01-0119111410.1186/s12915-020-00940-yA protocol for adding knowledge to Wikidata: aligning resources on human coronavirusesAndra Waagmeester0Egon L. Willighagen1Andrew I. Su2Martina Kutmon3Jose Emilio Labra Gayo4Daniel Fernández-Álvarez5Quentin Groom6Peter J. Schaap7Lisa M. Verhagen8Jasper J. Koehorst9MicelioDepartment of Bioinformatics – BiGCaT, NUTRIM, Maastricht UniversityDepartment of Integrative Structural and Computational Biology, The Scripps Research InstituteDepartment of Bioinformatics – BiGCaT, NUTRIM, Maastricht UniversityWESO Research Group, University of OviedoWESO Research Group, University of OviedoMeise Botanic GardenDepartment of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & ResearchIntravaccDepartment of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & ResearchAbstract Background Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. Results As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Conclusions Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).https://doi.org/10.1186/s12915-020-00940-yCOVID-19WikidataLinked dataShExOpen Science
collection DOAJ
language English
format Article
sources DOAJ
author Andra Waagmeester
Egon L. Willighagen
Andrew I. Su
Martina Kutmon
Jose Emilio Labra Gayo
Daniel Fernández-Álvarez
Quentin Groom
Peter J. Schaap
Lisa M. Verhagen
Jasper J. Koehorst
spellingShingle Andra Waagmeester
Egon L. Willighagen
Andrew I. Su
Martina Kutmon
Jose Emilio Labra Gayo
Daniel Fernández-Álvarez
Quentin Groom
Peter J. Schaap
Lisa M. Verhagen
Jasper J. Koehorst
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
BMC Biology
COVID-19
Wikidata
Linked data
ShEx
Open Science
author_facet Andra Waagmeester
Egon L. Willighagen
Andrew I. Su
Martina Kutmon
Jose Emilio Labra Gayo
Daniel Fernández-Álvarez
Quentin Groom
Peter J. Schaap
Lisa M. Verhagen
Jasper J. Koehorst
author_sort Andra Waagmeester
title A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_short A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_full A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_fullStr A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_full_unstemmed A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_sort protocol for adding knowledge to wikidata: aligning resources on human coronaviruses
publisher BMC
series BMC Biology
issn 1741-7007
publishDate 2021-01-01
description Abstract Background Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. Results As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Conclusions Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
topic COVID-19
Wikidata
Linked data
ShEx
Open Science
url https://doi.org/10.1186/s12915-020-00940-y
work_keys_str_mv AT andrawaagmeester aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT egonlwillighagen aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT andrewisu aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT martinakutmon aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT joseemiliolabragayo aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT danielfernandezalvarez aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT quentingroom aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT peterjschaap aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT lisamverhagen aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT jasperjkoehorst aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT andrawaagmeester protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT egonlwillighagen protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT andrewisu protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT martinakutmon protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT joseemiliolabragayo protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT danielfernandezalvarez protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT quentingroom protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT peterjschaap protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT lisamverhagen protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT jasperjkoehorst protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
_version_ 1724325769139716096