DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework

Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has b...

Full description

Bibliographic Details
Main Authors: Po-Jung Huang, Jui-Huan Chang, Hou-Hsien Lin, Yu-Xuan Li, Chi-Ching Lee, Chung-Tsai Su, Yun-Lung Li, Ming-Tai Chang, Sid Weng, Wei-Hung Cheng, Cheng-Hsun Chiu, Petrus Tang
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Computational and Mathematical Methods in Medicine
Online Access:http://dx.doi.org/10.1155/2020/7231205
id doaj-c3da8d6dd0dc433c946eaeb37df4364d
record_format Article
spelling doaj-c3da8d6dd0dc433c946eaeb37df4364d2020-11-25T03:52:09ZengHindawi LimitedComputational and Mathematical Methods in Medicine1748-670X1748-67182020-01-01202010.1155/2020/72312057231205DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing FrameworkPo-Jung Huang0Jui-Huan Chang1Hou-Hsien Lin2Yu-Xuan Li3Chi-Ching Lee4Chung-Tsai Su5Yun-Lung Li6Ming-Tai Chang7Sid Weng8Wei-Hung Cheng9Cheng-Hsun Chiu10Petrus Tang11Department of Biomedical Sciences, Chang Gung University, Taoyuan, TaiwanGraduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, TaiwanInstitute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, TaiwanGraduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, TaiwanDepartment of Computer Science and Information Engineering, Chang Gung University, Taoyuan, TaiwanDepartment of Biomedical Sciences, Chang Gung University, Taoyuan, TaiwanDepartment of Biomedical Sciences, Chang Gung University, Taoyuan, TaiwanDepartment of Biomedical Sciences, Chang Gung University, Taoyuan, TaiwanDepartment of Biomedical Sciences, Chang Gung University, Taoyuan, TaiwanDepartment of Parasitology, College of Medicine, Chang Gung University, Taoyuan, TaiwanGenomic Medicine Core Laboratory, Chang Gung Memorial Hospital, Linkou, TaiwanGraduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, TaiwanAlthough sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has been widely used in genome projects and population genetic studies for many years. This was until the Google Brain team developed a new method, DeepVariant, which utilizes deep neural networks to construct an image classification model to identify genetic variants. However, the superior accuracy of DeepVariant comes at the cost of computational intensity, largely constraining its applications. Accordingly, we present DeepVariant-on-Spark to optimize resource allocation, enable multi-GPU support, and accelerate the processing of the DeepVariant pipeline. To make DeepVariant-on-Spark more accessible to everyone, we have deployed the DeepVariant-on-Spark to the Google Cloud Platform (GCP). Users can deploy DeepVariant-on-Spark on the GCP following our instruction within 20 minutes and start to analyze at least ten whole-genome sequencing datasets using free credits provided by the GCP. DeepVaraint-on-Spark is freely available for small-scale genome analysis using a cloud-based computing framework, which is suitable for pilot testing or preliminary study, while reserving the flexibility and scalability for large-scale sequencing projects.http://dx.doi.org/10.1155/2020/7231205
collection DOAJ
language English
format Article
sources DOAJ
author Po-Jung Huang
Jui-Huan Chang
Hou-Hsien Lin
Yu-Xuan Li
Chi-Ching Lee
Chung-Tsai Su
Yun-Lung Li
Ming-Tai Chang
Sid Weng
Wei-Hung Cheng
Cheng-Hsun Chiu
Petrus Tang
spellingShingle Po-Jung Huang
Jui-Huan Chang
Hou-Hsien Lin
Yu-Xuan Li
Chi-Ching Lee
Chung-Tsai Su
Yun-Lung Li
Ming-Tai Chang
Sid Weng
Wei-Hung Cheng
Cheng-Hsun Chiu
Petrus Tang
DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
Computational and Mathematical Methods in Medicine
author_facet Po-Jung Huang
Jui-Huan Chang
Hou-Hsien Lin
Yu-Xuan Li
Chi-Ching Lee
Chung-Tsai Su
Yun-Lung Li
Ming-Tai Chang
Sid Weng
Wei-Hung Cheng
Cheng-Hsun Chiu
Petrus Tang
author_sort Po-Jung Huang
title DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_short DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_full DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_fullStr DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_full_unstemmed DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_sort deepvariant-on-spark: small-scale genome analysis using a cloud-based computing framework
publisher Hindawi Limited
series Computational and Mathematical Methods in Medicine
issn 1748-670X
1748-6718
publishDate 2020-01-01
description Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has been widely used in genome projects and population genetic studies for many years. This was until the Google Brain team developed a new method, DeepVariant, which utilizes deep neural networks to construct an image classification model to identify genetic variants. However, the superior accuracy of DeepVariant comes at the cost of computational intensity, largely constraining its applications. Accordingly, we present DeepVariant-on-Spark to optimize resource allocation, enable multi-GPU support, and accelerate the processing of the DeepVariant pipeline. To make DeepVariant-on-Spark more accessible to everyone, we have deployed the DeepVariant-on-Spark to the Google Cloud Platform (GCP). Users can deploy DeepVariant-on-Spark on the GCP following our instruction within 20 minutes and start to analyze at least ten whole-genome sequencing datasets using free credits provided by the GCP. DeepVaraint-on-Spark is freely available for small-scale genome analysis using a cloud-based computing framework, which is suitable for pilot testing or preliminary study, while reserving the flexibility and scalability for large-scale sequencing projects.
url http://dx.doi.org/10.1155/2020/7231205
work_keys_str_mv AT pojunghuang deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT juihuanchang deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT houhsienlin deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT yuxuanli deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT chichinglee deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT chungtsaisu deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT yunlungli deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT mingtaichang deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT sidweng deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT weihungcheng deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT chenghsunchiu deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT petrustang deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
_version_ 1715099429382389760