Use of application containers and workflows for genomic data analysis

Background: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibili...

Full description

Bibliographic Details
Main Authors: Wade L Schulz, Thomas Durant, Alexa J Siddon, Richard Torres
Format: Article
Language:English
Published: Wolters Kluwer Medknow Publications 2016-01-01
Series:Journal of Pathology Informatics
Subjects:
Online Access:http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2016;volume=7;issue=1;spage=53;epage=53;aulast=Schulz
id doaj-71c5a92c8c4740d6a8fd9fcc1012306d
record_format Article
spelling doaj-71c5a92c8c4740d6a8fd9fcc1012306d2020-11-24T21:19:20ZengWolters Kluwer Medknow PublicationsJournal of Pathology Informatics2153-35392153-35392016-01-0171535310.4103/2153-3539.197197Use of application containers and workflows for genomic data analysisWade L SchulzThomas DurantAlexa J SiddonRichard TorresBackground: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. Methods: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. Results: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. Conclusions: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia.http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2016;volume=7;issue=1;spage=53;epage=53;aulast=SchulzBig databioinformatics workflowcontainerizationgenomics
collection DOAJ
language English
format Article
sources DOAJ
author Wade L Schulz
Thomas Durant
Alexa J Siddon
Richard Torres
spellingShingle Wade L Schulz
Thomas Durant
Alexa J Siddon
Richard Torres
Use of application containers and workflows for genomic data analysis
Journal of Pathology Informatics
Big data
bioinformatics workflow
containerization
genomics
author_facet Wade L Schulz
Thomas Durant
Alexa J Siddon
Richard Torres
author_sort Wade L Schulz
title Use of application containers and workflows for genomic data analysis
title_short Use of application containers and workflows for genomic data analysis
title_full Use of application containers and workflows for genomic data analysis
title_fullStr Use of application containers and workflows for genomic data analysis
title_full_unstemmed Use of application containers and workflows for genomic data analysis
title_sort use of application containers and workflows for genomic data analysis
publisher Wolters Kluwer Medknow Publications
series Journal of Pathology Informatics
issn 2153-3539
2153-3539
publishDate 2016-01-01
description Background: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. Methods: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. Results: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. Conclusions: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia.
topic Big data
bioinformatics workflow
containerization
genomics
url http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2016;volume=7;issue=1;spage=53;epage=53;aulast=Schulz
work_keys_str_mv AT wadelschulz useofapplicationcontainersandworkflowsforgenomicdataanalysis
AT thomasdurant useofapplicationcontainersandworkflowsforgenomicdataanalysis
AT alexajsiddon useofapplicationcontainersandworkflowsforgenomicdataanalysis
AT richardtorres useofapplicationcontainersandworkflowsforgenomicdataanalysis
_version_ 1726005920747487232