Long-Read Metagenomics Improves the Recovery of Viral Diversity from Complex Natural Marine Samples

The recovery of DNA from viromes is a major obstacle in the use of long-read sequencing to study their genomes. For this reason, the use of cellular metagenomes (.0.2-mm size range) emerges as an interesting complementary tool, since they contain large amounts of naturally amplified viral genomes fr...

Full description

Bibliographic Details
Main Authors: Haro-Moreno, J.M (Author), López-Pérez, M. (Author), Rodriguez-Valera, F. (Author), Zaragoza-Solas, A. (Author)
Format: Article
Language:English
Published: American Society for Microbiology 2022
Subjects:
sea
Online Access:View Fulltext in Publisher
Description
Summary:The recovery of DNA from viromes is a major obstacle in the use of long-read sequencing to study their genomes. For this reason, the use of cellular metagenomes (.0.2-mm size range) emerges as an interesting complementary tool, since they contain large amounts of naturally amplified viral genomes from prelytic replication. We have applied second-generation (Illumina NextSeq; short reads) and third-generation (PacBio Sequel II; long reads) sequencing to compare the diversity and features of the viral community in a marine sample obtained from offshore waters of the western Mediterranean. We found that a major wedge of the expected marine viral diversity was directly recovered by the raw PacBio circular consensus sequencing (CCS) reads. More than 30,000 sequences were detected only in this data set, with no homologues in the long- and short-read assembly, and ca. 26,000 had no homologues in the large data set of the Global Ocean Virome 2 (GOV2), highlighting the information gap created by the assembly bias. At the level of complete viral genomes, the performance was similar in both approaches. However, the hybrid long- and short-read assembly provided the longest average length of the sequences and improved the host assignment. Although no novel major clades of viruses were found, there was an increase in the intraclade genomic diversity recovered by long reads that produced an enriched assessment of the real diversity and allowed the discovery of novel genes with biotechnological potential (e.g., endolysin genes). © 2022 Zaragoza-Solas et al.
ISBN:23795077 (ISSN)
ISSN:23795077 (ISSN)
DOI:10.1128/msystems.00192-22