PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.

The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and o...

Full description

Bibliographic Details
Main Authors: Bulat Faezov, Roland L Dunbrack
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0253411
id doaj-7f3f0cfc76ae4757a2a52e26dc666046
record_format Article
spelling doaj-7f3f0cfc76ae4757a2a52e26dc6660462021-07-23T04:31:06ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01167e025341110.1371/journal.pone.0253411PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.Bulat FaezovRoland L DunbrackThe Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., "P04637" or "P53_HUMAN") and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.https://doi.org/10.1371/journal.pone.0253411
collection DOAJ
language English
format Article
sources DOAJ
author Bulat Faezov
Roland L Dunbrack
spellingShingle Bulat Faezov
Roland L Dunbrack
PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
PLoS ONE
author_facet Bulat Faezov
Roland L Dunbrack
author_sort Bulat Faezov
title PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
title_short PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
title_full PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
title_fullStr PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
title_full_unstemmed PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
title_sort pdbrenum: a webserver and program providing protein data bank files renumbered according to their uniprot sequences.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2021-01-01
description The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., "P04637" or "P53_HUMAN") and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.
url https://doi.org/10.1371/journal.pone.0253411
work_keys_str_mv AT bulatfaezov pdbrenumawebserverandprogramprovidingproteindatabankfilesrenumberedaccordingtotheiruniprotsequences
AT rolandldunbrack pdbrenumawebserverandprogramprovidingproteindatabankfilesrenumberedaccordingtotheiruniprotsequences
_version_ 1721290762641997824