SAUTE: sequence assembly using target enrichment

Background: Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multip...

Full description

Bibliographic Details
Main Authors:	Agarwala, R. (Author), Souvorov, A. (Author)
Format:	Article
Language:	English
Published:	BioMed Central Ltd 2021
Subjects:	algorithm Algorithms Antimicrobial resistance Carry-over contamination Coding sequences de Bruijn graphs De Bruijn graphs De-novo assembly DNA sequence genome Genome Genomic regions genomics Genomics high throughput sequencing High-Throughput Nucleotide Sequencing Illumina reads Protein sequences Proteins RNA RNA-seq RNA-Seq Sequence Analysis, DNA Sequence assemblies Shovels Target proteins Target sequences
Online Access:	View Fulltext in Publisher


LEADER	02730nam a2200469Ia 4500
001	10.1186-s12859-021-04174-9
008	220427s2021 CNT 000 0 und d
020			\|a 14712105 (ISSN)
245	1	0	\|a SAUTE: sequence assembly using target enrichment
260		0	\|b BioMed Central Ltd \|c 2021
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1186/s12859-021-04174-9
520	3		\|a Background: Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results: To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions: For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision. © 2021, This is a U.S. government work and not under copyright protection in the U.S; foreign copyright protection may apply.
650	0	4	\|a algorithm
650	0	4	\|a Algorithms
650	0	4	\|a Antimicrobial resistance
650	0	4	\|a Carry-over contamination
650	0	4	\|a Coding sequences
650	0	4	\|a de Bruijn graphs
650	0	4	\|a De Bruijn graphs
650	0	4	\|a De-novo assembly
650	0	4	\|a DNA sequence
650	0	4	\|a genome
650	0	4	\|a Genome
650	0	4	\|a Genomic regions
650	0	4	\|a genomics
650	0	4	\|a Genomics
650	0	4	\|a high throughput sequencing
650	0	4	\|a High-Throughput Nucleotide Sequencing
650	0	4	\|a Illumina reads
650	0	4	\|a Protein sequences
650	0	4	\|a Proteins
650	0	4	\|a RNA
650	0	4	\|a RNA-seq
650	0	4	\|a RNA-Seq
650	0	4	\|a Sequence Analysis, DNA
650	0	4	\|a Sequence assemblies
650	0	4	\|a Shovels
650	0	4	\|a Target proteins
650	0	4	\|a Target sequences
700	1		\|a Agarwala, R. \|e author
700	1		\|a Souvorov, A. \|e author
773			\|t BMC Bioinformatics

SAUTE: sequence assembly using target enrichment

Similar Items