PlasClass improves plasmid sequence classification.

Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of pla...

Full description

Bibliographic Details
Main Authors: David Pellow, Itzik Mizrahi, Ron Shamir
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-04-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1007781
id doaj-835e5fbc8a8045ef943c3510129a2824
record_format Article
spelling doaj-835e5fbc8a8045ef943c3510129a28242021-04-21T15:15:18ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-04-01164e100778110.1371/journal.pcbi.1007781PlasClass improves plasmid sequence classification.David PellowItzik MizrahiRon ShamirMany bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass.https://doi.org/10.1371/journal.pcbi.1007781
collection DOAJ
language English
format Article
sources DOAJ
author David Pellow
Itzik Mizrahi
Ron Shamir
spellingShingle David Pellow
Itzik Mizrahi
Ron Shamir
PlasClass improves plasmid sequence classification.
PLoS Computational Biology
author_facet David Pellow
Itzik Mizrahi
Ron Shamir
author_sort David Pellow
title PlasClass improves plasmid sequence classification.
title_short PlasClass improves plasmid sequence classification.
title_full PlasClass improves plasmid sequence classification.
title_fullStr PlasClass improves plasmid sequence classification.
title_full_unstemmed PlasClass improves plasmid sequence classification.
title_sort plasclass improves plasmid sequence classification.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2020-04-01
description Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass.
url https://doi.org/10.1371/journal.pcbi.1007781
work_keys_str_mv AT davidpellow plasclassimprovesplasmidsequenceclassification
AT itzikmizrahi plasclassimprovesplasmidsequenceclassification
AT ronshamir plasclassimprovesplasmidsequenceclassification
_version_ 1714667663477702656