DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network

Abstract Background Calling genetic variations from sequence reads is an important problem in genomics. There are many existing methods for calling various types of variations. Recently, Google developed a method for calling single nucleotide polymorphisms (SNPs) based on deep learning. Their method...

Full description

Bibliographic Details
Main Authors: Lei Cai, Yufeng Wu, Jingyang Gao
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3299-y
id doaj-e2ce3500fec243979c89fbeb2dffd6db
record_format Article
spelling doaj-e2ce3500fec243979c89fbeb2dffd6db2020-12-13T12:41:53ZengBMCBMC Bioinformatics1471-21052019-12-0120111710.1186/s12859-019-3299-yDeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural networkLei Cai0Yufeng Wu1Jingyang Gao2Department of Information Science and Technology, Beijing University of Chemical TechnologyDepartment of Computer Science and Engineering, University of ConnecticutDepartment of Information Science and Technology, Beijing University of Chemical TechnologyAbstract Background Calling genetic variations from sequence reads is an important problem in genomics. There are many existing methods for calling various types of variations. Recently, Google developed a method for calling single nucleotide polymorphisms (SNPs) based on deep learning. Their method visualizes sequence reads in the forms of images. These images are then used to train a deep neural network model, which is used to call SNPs. This raises a research question: can deep learning be used to call more complex genetic variations such as structural variations (SVs) from sequence data? Results In this paper, we extend this high-level approach to the problem of calling structural variations. We present DeepSV, an approach based on deep learning for calling long deletions from sequence reads. DeepSV is based on a novel method of visualizing sequence reads. The visualization is designed to capture multiple sources of information in the sequence data that are relevant to long deletions. DeepSV also implements techniques for working with noisy training data. DeepSV trains a model from the visualized sequence reads and calls deletions based on this model. We demonstrate that DeepSV outperforms existing methods in terms of accuracy and efficiency of deletion calling on the data from the 1000 Genomes Project. Conclusions Our work shows that deep learning can potentially lead to effective calling of different types of genetic variations that are complex than SNPs.https://doi.org/10.1186/s12859-019-3299-yStructural variationsDeep learningFeature extractionVisualizationGenetic variationsHigh-throughput sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Lei Cai
Yufeng Wu
Jingyang Gao
spellingShingle Lei Cai
Yufeng Wu
Jingyang Gao
DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
BMC Bioinformatics
Structural variations
Deep learning
Feature extraction
Visualization
Genetic variations
High-throughput sequencing
author_facet Lei Cai
Yufeng Wu
Jingyang Gao
author_sort Lei Cai
title DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
title_short DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
title_full DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
title_fullStr DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
title_full_unstemmed DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
title_sort deepsv: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-12-01
description Abstract Background Calling genetic variations from sequence reads is an important problem in genomics. There are many existing methods for calling various types of variations. Recently, Google developed a method for calling single nucleotide polymorphisms (SNPs) based on deep learning. Their method visualizes sequence reads in the forms of images. These images are then used to train a deep neural network model, which is used to call SNPs. This raises a research question: can deep learning be used to call more complex genetic variations such as structural variations (SVs) from sequence data? Results In this paper, we extend this high-level approach to the problem of calling structural variations. We present DeepSV, an approach based on deep learning for calling long deletions from sequence reads. DeepSV is based on a novel method of visualizing sequence reads. The visualization is designed to capture multiple sources of information in the sequence data that are relevant to long deletions. DeepSV also implements techniques for working with noisy training data. DeepSV trains a model from the visualized sequence reads and calls deletions based on this model. We demonstrate that DeepSV outperforms existing methods in terms of accuracy and efficiency of deletion calling on the data from the 1000 Genomes Project. Conclusions Our work shows that deep learning can potentially lead to effective calling of different types of genetic variations that are complex than SNPs.
topic Structural variations
Deep learning
Feature extraction
Visualization
Genetic variations
High-throughput sequencing
url https://doi.org/10.1186/s12859-019-3299-y
work_keys_str_mv AT leicai deepsvaccuratecallingofgenomicdeletionsfromhighthroughputsequencingdatausingdeepconvolutionalneuralnetwork
AT yufengwu deepsvaccuratecallingofgenomicdeletionsfromhighthroughputsequencingdatausingdeepconvolutionalneuralnetwork
AT jingyanggao deepsvaccuratecallingofgenomicdeletionsfromhighthroughputsequencingdatausingdeepconvolutionalneuralnetwork
_version_ 1724384352958152704