The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
<p>Abstract</p> <p>Background</p> <p>Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when at...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2007-10-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/8/401 |
id |
doaj-0f41d121182044bbbf304b6eb8e84d9b |
---|---|
record_format |
Article |
spelling |
doaj-0f41d121182044bbbf304b6eb8e84d9b2020-11-24T21:19:07ZengBMCBMC Bioinformatics1471-21052007-10-018140110.1186/1471-2105-8-401The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databasesLeinonen RaskoLin QuanReisinger FlorianKerrien SamuelMartens LennartJones PhilipCôté Richard GApweiler RolfHermjakob Henning<p>Abstract</p> <p>Background</p> <p>Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs.</p> <p>Results</p> <p>We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface.</p> <p>Conclusion</p> <p>We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at <url>http://www.ebi.ac.uk/Tools/picr</url>.</p> http://www.biomedcentral.com/1471-2105/8/401 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Leinonen Rasko Lin Quan Reisinger Florian Kerrien Samuel Martens Lennart Jones Philip Côté Richard G Apweiler Rolf Hermjakob Henning |
spellingShingle |
Leinonen Rasko Lin Quan Reisinger Florian Kerrien Samuel Martens Lennart Jones Philip Côté Richard G Apweiler Rolf Hermjakob Henning The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases BMC Bioinformatics |
author_facet |
Leinonen Rasko Lin Quan Reisinger Florian Kerrien Samuel Martens Lennart Jones Philip Côté Richard G Apweiler Rolf Hermjakob Henning |
author_sort |
Leinonen Rasko |
title |
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases |
title_short |
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases |
title_full |
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases |
title_fullStr |
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases |
title_full_unstemmed |
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases |
title_sort |
protein identifier cross-referencing (picr) service: reconciling protein identifiers across multiple source databases |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2007-10-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs.</p> <p>Results</p> <p>We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface.</p> <p>Conclusion</p> <p>We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at <url>http://www.ebi.ac.uk/Tools/picr</url>.</p> |
url |
http://www.biomedcentral.com/1471-2105/8/401 |
work_keys_str_mv |
AT leinonenrasko theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT linquan theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT reisingerflorian theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT kerriensamuel theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT martenslennart theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT jonesphilip theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT coterichardg theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT apweilerrolf theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT hermjakobhenning theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT leinonenrasko proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT linquan proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT reisingerflorian proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT kerriensamuel proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT martenslennart proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT jonesphilip proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT coterichardg proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT apweilerrolf proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases AT hermjakobhenning proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases |
_version_ |
1726007002458488832 |