Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples

Metagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus–host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from...

Full description

Bibliographic Details
Main Author: Kai Song
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-05-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmicb.2021.664560/full
id doaj-a742e02c94de4a90a20c0920884145ee
record_format Article
spelling doaj-a742e02c94de4a90a20c0920884145ee2021-05-21T04:27:08ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2021-05-011210.3389/fmicb.2021.664560664560Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic SamplesKai SongMetagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus–host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from metagenome sequences because most assembled metagenomic contigs are short and possess few or no predicted genes, and most metagenomic viral genes are dissimilar to known viral genes. In this study, I developed a Markov model-based method, VirMC, to identify viral sequences from metagenomic data. VirMC uses Markov chains to model sequence signatures and construct a scoring model using a likelihood test to distinguish viral and bacterial sequences. Compared with the other two state-of-the-art viral sequence-prediction methods, VirFinder and PPR-Meta, my proposed method outperformed VirFinder and had similar performance with PPR-Meta for short contigs with length less than 400 bp. VirMC outperformed VirFinder and PPR-Meta for identifying viral sequences in contaminated metagenomic samples with eukaryotic sequences. VirMC showed better performance in assembling viral-genome sequences from metagenomic data (based on filtering potential bacterial reads). Applying VirMC to human gut metagenomes from healthy subjects and patients with type-2 diabetes (T2D) revealed that viral contigs could help classify healthy and diseased statuses. This alignment-free method complements gene-based alignment approaches and will significantly improve the precision of viral sequence identification.https://www.frontiersin.org/articles/10.3389/fmicb.2021.664560/fullmetagenomeMarkov chainvirusassemblycontigs
collection DOAJ
language English
format Article
sources DOAJ
author Kai Song
spellingShingle Kai Song
Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples
Frontiers in Microbiology
metagenome
Markov chain
virus
assembly
contigs
author_facet Kai Song
author_sort Kai Song
title Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples
title_short Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples
title_full Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples
title_fullStr Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples
title_full_unstemmed Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples
title_sort reads binning improves the assembly of viral genome sequences from metagenomic samples
publisher Frontiers Media S.A.
series Frontiers in Microbiology
issn 1664-302X
publishDate 2021-05-01
description Metagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus–host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from metagenome sequences because most assembled metagenomic contigs are short and possess few or no predicted genes, and most metagenomic viral genes are dissimilar to known viral genes. In this study, I developed a Markov model-based method, VirMC, to identify viral sequences from metagenomic data. VirMC uses Markov chains to model sequence signatures and construct a scoring model using a likelihood test to distinguish viral and bacterial sequences. Compared with the other two state-of-the-art viral sequence-prediction methods, VirFinder and PPR-Meta, my proposed method outperformed VirFinder and had similar performance with PPR-Meta for short contigs with length less than 400 bp. VirMC outperformed VirFinder and PPR-Meta for identifying viral sequences in contaminated metagenomic samples with eukaryotic sequences. VirMC showed better performance in assembling viral-genome sequences from metagenomic data (based on filtering potential bacterial reads). Applying VirMC to human gut metagenomes from healthy subjects and patients with type-2 diabetes (T2D) revealed that viral contigs could help classify healthy and diseased statuses. This alignment-free method complements gene-based alignment approaches and will significantly improve the precision of viral sequence identification.
topic metagenome
Markov chain
virus
assembly
contigs
url https://www.frontiersin.org/articles/10.3389/fmicb.2021.664560/full
work_keys_str_mv AT kaisong readsbinningimprovestheassemblyofviralgenomesequencesfrommetagenomicsamples
_version_ 1721432660208779264