VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites

Abstract Background Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic recon...

Full description

Bibliographic Details
Main Authors: Giulio Spinozzi, Andrea Calabria, Stefano Brasca, Stefano Beretta, Ivan Merelli, Luciano Milanesi, Eugenio Montini
Format: Article
Language:English
Published: BMC 2017-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1937-9
id doaj-09c3c83637b24ceaa6e89a6a947c9101
record_format Article
spelling doaj-09c3c83637b24ceaa6e89a6a947c91012020-11-25T00:39:41ZengBMCBMC Bioinformatics1471-21052017-11-0118111210.1186/s12859-017-1937-9VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sitesGiulio Spinozzi0Andrea Calabria1Stefano Brasca2Stefano Beretta3Ivan Merelli4Luciano Milanesi5Eugenio Montini6San Raffaele Telethon Institute for Gene Therapy (SR-Tiget), IRCCS, San Raffaele Scientific InstituteSan Raffaele Telethon Institute for Gene Therapy (SR-Tiget), IRCCS, San Raffaele Scientific InstituteSan Raffaele Telethon Institute for Gene Therapy (SR-Tiget), IRCCS, San Raffaele Scientific InstituteDepartment of Computer Science, University of Milano BicoccaNational Research Council, Institute for Biomedical TechnologiesNational Research Council, Institute for Biomedical TechnologiesSan Raffaele Telethon Institute for Gene Therapy (SR-Tiget), IRCCS, San Raffaele Scientific InstituteAbstract Background Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process “big data” in a reasonable computational time. Results Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). Conclusions We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).http://link.springer.com/article/10.1186/s12859-017-1937-9Open source softwareBioinformatics pipelineIntegration site analysisGene therapyHigh-throughput sequencingNext-generation sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Giulio Spinozzi
Andrea Calabria
Stefano Brasca
Stefano Beretta
Ivan Merelli
Luciano Milanesi
Eugenio Montini
spellingShingle Giulio Spinozzi
Andrea Calabria
Stefano Brasca
Stefano Beretta
Ivan Merelli
Luciano Milanesi
Eugenio Montini
VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
BMC Bioinformatics
Open source software
Bioinformatics pipeline
Integration site analysis
Gene therapy
High-throughput sequencing
Next-generation sequencing
author_facet Giulio Spinozzi
Andrea Calabria
Stefano Brasca
Stefano Beretta
Ivan Merelli
Luciano Milanesi
Eugenio Montini
author_sort Giulio Spinozzi
title VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_short VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_full VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_fullStr VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_full_unstemmed VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_sort vispa2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-11-01
description Abstract Background Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process “big data” in a reasonable computational time. Results Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). Conclusions We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).
topic Open source software
Bioinformatics pipeline
Integration site analysis
Gene therapy
High-throughput sequencing
Next-generation sequencing
url http://link.springer.com/article/10.1186/s12859-017-1937-9
work_keys_str_mv AT giuliospinozzi vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT andreacalabria vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT stefanobrasca vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT stefanoberetta vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT ivanmerelli vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT lucianomilanesi vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT eugeniomontini vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
_version_ 1725293051089256448