A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
Abstract Background Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-01-01
|
Series: | BMC Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12915-020-00940-y |
id |
doaj-3bffc3df9828436bb991c4ad57f7be4c |
---|---|
record_format |
Article |
spelling |
doaj-3bffc3df9828436bb991c4ad57f7be4c2021-01-24T12:26:28ZengBMCBMC Biology1741-70072021-01-0119111410.1186/s12915-020-00940-yA protocol for adding knowledge to Wikidata: aligning resources on human coronavirusesAndra Waagmeester0Egon L. Willighagen1Andrew I. Su2Martina Kutmon3Jose Emilio Labra Gayo4Daniel Fernández-Álvarez5Quentin Groom6Peter J. Schaap7Lisa M. Verhagen8Jasper J. Koehorst9MicelioDepartment of Bioinformatics – BiGCaT, NUTRIM, Maastricht UniversityDepartment of Integrative Structural and Computational Biology, The Scripps Research InstituteDepartment of Bioinformatics – BiGCaT, NUTRIM, Maastricht UniversityWESO Research Group, University of OviedoWESO Research Group, University of OviedoMeise Botanic GardenDepartment of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & ResearchIntravaccDepartment of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & ResearchAbstract Background Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. Results As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Conclusions Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).https://doi.org/10.1186/s12915-020-00940-yCOVID-19WikidataLinked dataShExOpen Science |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Andra Waagmeester Egon L. Willighagen Andrew I. Su Martina Kutmon Jose Emilio Labra Gayo Daniel Fernández-Álvarez Quentin Groom Peter J. Schaap Lisa M. Verhagen Jasper J. Koehorst |
spellingShingle |
Andra Waagmeester Egon L. Willighagen Andrew I. Su Martina Kutmon Jose Emilio Labra Gayo Daniel Fernández-Álvarez Quentin Groom Peter J. Schaap Lisa M. Verhagen Jasper J. Koehorst A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses BMC Biology COVID-19 Wikidata Linked data ShEx Open Science |
author_facet |
Andra Waagmeester Egon L. Willighagen Andrew I. Su Martina Kutmon Jose Emilio Labra Gayo Daniel Fernández-Álvarez Quentin Groom Peter J. Schaap Lisa M. Verhagen Jasper J. Koehorst |
author_sort |
Andra Waagmeester |
title |
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_short |
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_full |
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_fullStr |
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_full_unstemmed |
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_sort |
protocol for adding knowledge to wikidata: aligning resources on human coronaviruses |
publisher |
BMC |
series |
BMC Biology |
issn |
1741-7007 |
publishDate |
2021-01-01 |
description |
Abstract Background Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. Results As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Conclusions Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4). |
topic |
COVID-19 Wikidata Linked data ShEx Open Science |
url |
https://doi.org/10.1186/s12915-020-00940-y |
work_keys_str_mv |
AT andrawaagmeester aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT egonlwillighagen aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT andrewisu aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT martinakutmon aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT joseemiliolabragayo aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT danielfernandezalvarez aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT quentingroom aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT peterjschaap aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT lisamverhagen aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT jasperjkoehorst aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT andrawaagmeester protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT egonlwillighagen protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT andrewisu protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT martinakutmon protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT joseemiliolabragayo protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT danielfernandezalvarez protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT quentingroom protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT peterjschaap protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT lisamverhagen protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT jasperjkoehorst protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses |
_version_ |
1724325769139716096 |