VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

Abstract Background Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data f...

Full description

Bibliographic Details
Main Authors: Scott Christley, Mikhail K. Levin, Inimary T. Toby, John M. Fonner, Nancy L. Monson, William H. Rounds, Florian Rubelt, Walter Scarborough, Richard H. Scheuermann, Lindsay G. Cowell
Format: Article
Language:English
Published: BMC 2017-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1853-z
id doaj-26c993d5cb694e45b75adc076995ae4a
record_format Article
spelling doaj-26c993d5cb694e45b75adc076995ae4a2020-11-24T21:17:08ZengBMCBMC Bioinformatics1471-21052017-10-011811510.1186/s12859-017-1853-zVDJPipe: a pipelined tool for pre-processing immune repertoire sequencing dataScott Christley0Mikhail K. Levin1Inimary T. Toby2John M. Fonner3Nancy L. Monson4William H. Rounds5Florian Rubelt6Walter Scarborough7Richard H. Scheuermann8Lindsay G. Cowell9Department of Clinical Sciences, UT Southwestern Medical CenterBank of America Corporate CenterDepartment of Clinical Sciences, UT Southwestern Medical CenterTexas Advanced Computing CenterDepartment of Neurology and Neurotherapeutics, UT Southwestern Medical CenterDepartment of Clinical Sciences, UT Southwestern Medical CenterDepartment of Microbiology and Immunology, Stanford University School of MedicineTexas Advanced Computing CenterJ. Craig Venter InstituteDepartment of Clinical Sciences, UT Southwestern Medical CenterAbstract Background Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5′ and 3′ PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.http://link.springer.com/article/10.1186/s12859-017-1853-zRep-seqImmune repertoire analysisBioinformatics
collection DOAJ
language English
format Article
sources DOAJ
author Scott Christley
Mikhail K. Levin
Inimary T. Toby
John M. Fonner
Nancy L. Monson
William H. Rounds
Florian Rubelt
Walter Scarborough
Richard H. Scheuermann
Lindsay G. Cowell
spellingShingle Scott Christley
Mikhail K. Levin
Inimary T. Toby
John M. Fonner
Nancy L. Monson
William H. Rounds
Florian Rubelt
Walter Scarborough
Richard H. Scheuermann
Lindsay G. Cowell
VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
BMC Bioinformatics
Rep-seq
Immune repertoire analysis
Bioinformatics
author_facet Scott Christley
Mikhail K. Levin
Inimary T. Toby
John M. Fonner
Nancy L. Monson
William H. Rounds
Florian Rubelt
Walter Scarborough
Richard H. Scheuermann
Lindsay G. Cowell
author_sort Scott Christley
title VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_short VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_full VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_fullStr VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_full_unstemmed VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_sort vdjpipe: a pipelined tool for pre-processing immune repertoire sequencing data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-10-01
description Abstract Background Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5′ and 3′ PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.
topic Rep-seq
Immune repertoire analysis
Bioinformatics
url http://link.springer.com/article/10.1186/s12859-017-1853-z
work_keys_str_mv AT scottchristley vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT mikhailklevin vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT inimaryttoby vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT johnmfonner vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT nancylmonson vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT williamhrounds vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT florianrubelt vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT walterscarborough vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT richardhscheuermann vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT lindsaygcowell vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
_version_ 1726013967047852032