A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database

Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. He...

Full description

Bibliographic Details
Main Authors: Alex Orlek, Hang Phan, Anna E. Sheppard, Michel Doumith, Matthew Ellington, Tim Peto, Derrick Crook, A. Sarah Walker, Neil Woodford, Muna F. Anjum, Nicole Stoesser
Format: Article
Language:English
Published: Elsevier 2017-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340917301567
id doaj-1d6c9918af444a369681cdd12458e5a8
record_format Article
spelling doaj-1d6c9918af444a369681cdd12458e5a82020-11-25T02:12:28ZengElsevierData in Brief2352-34092017-06-0112C42342610.1016/j.dib.2017.04.024A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide databaseAlex Orlek0Hang Phan1Anna E. Sheppard2Michel Doumith3Matthew Ellington4Tim Peto5Derrick Crook6A. Sarah Walker7Neil Woodford8Muna F. Anjum9Nicole Stoesser10Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKNuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKNuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKAntimicrobial Resistance and Healthcare Associated Infections (AMRHAI) Reference Unit, National Infection Service, Public Health England, London, UKNIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UKNuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKNuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKNuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKNIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UKNIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UKNuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UKThousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: “Ordering the mob: insights into replicon and MOB typing…” (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.http://www.sciencedirect.com/science/article/pii/S2352340917301567PlasmidsSequence data curationComplete genomesEnterobacteriaceae family
collection DOAJ
language English
format Article
sources DOAJ
author Alex Orlek
Hang Phan
Anna E. Sheppard
Michel Doumith
Matthew Ellington
Tim Peto
Derrick Crook
A. Sarah Walker
Neil Woodford
Muna F. Anjum
Nicole Stoesser
spellingShingle Alex Orlek
Hang Phan
Anna E. Sheppard
Michel Doumith
Matthew Ellington
Tim Peto
Derrick Crook
A. Sarah Walker
Neil Woodford
Muna F. Anjum
Nicole Stoesser
A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
Data in Brief
Plasmids
Sequence data curation
Complete genomes
Enterobacteriaceae family
author_facet Alex Orlek
Hang Phan
Anna E. Sheppard
Michel Doumith
Matthew Ellington
Tim Peto
Derrick Crook
A. Sarah Walker
Neil Woodford
Muna F. Anjum
Nicole Stoesser
author_sort Alex Orlek
title A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
title_short A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
title_full A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
title_fullStr A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
title_full_unstemmed A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
title_sort curated dataset of complete enterobacteriaceae plasmids compiled from the ncbi nucleotide database
publisher Elsevier
series Data in Brief
issn 2352-3409
publishDate 2017-06-01
description Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: “Ordering the mob: insights into replicon and MOB typing…” (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.
topic Plasmids
Sequence data curation
Complete genomes
Enterobacteriaceae family
url http://www.sciencedirect.com/science/article/pii/S2352340917301567
work_keys_str_mv AT alexorlek acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT hangphan acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT annaesheppard acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT micheldoumith acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT matthewellington acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT timpeto acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT derrickcrook acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT asarahwalker acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT neilwoodford acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT munafanjum acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT nicolestoesser acurateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT alexorlek curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT hangphan curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT annaesheppard curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT micheldoumith curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT matthewellington curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT timpeto curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT derrickcrook curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT asarahwalker curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT neilwoodford curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT munafanjum curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
AT nicolestoesser curateddatasetofcompleteenterobacteriaceaeplasmidscompiledfromthencbinucleotidedatabase
_version_ 1724909102533967872