Performance of neural network basecalling tools for Oxford Nanopore sequencing

Abstract Background Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at ac...

Full description

Bibliographic Details
Main Authors: Ryan R. Wick, Louise M. Judd, Kathryn E. Holt
Format: Article
Language:English
Published: BMC 2019-06-01
Series:Genome Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13059-019-1727-y
id doaj-7f8648c885cd4e4289a3cdabb1f9fbd3
record_format Article
spelling doaj-7f8648c885cd4e4289a3cdabb1f9fbd32020-11-25T02:54:52ZengBMCGenome Biology1474-760X2019-06-0120111010.1186/s13059-019-1727-yPerformance of neural network basecalling tools for Oxford Nanopore sequencingRyan R. Wick0Louise M. Judd1Kathryn E. Holt2Department of Infectious Diseases, Central Clinical School, Monash UniversityDepartment of Infectious Diseases, Central Clinical School, Monash UniversityDepartment of Infectious Diseases, Central Clinical School, Monash UniversityAbstract Background Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.http://link.springer.com/article/10.1186/s13059-019-1727-yOxford NanoporeBasecallingLong-read sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Ryan R. Wick
Louise M. Judd
Kathryn E. Holt
spellingShingle Ryan R. Wick
Louise M. Judd
Kathryn E. Holt
Performance of neural network basecalling tools for Oxford Nanopore sequencing
Genome Biology
Oxford Nanopore
Basecalling
Long-read sequencing
author_facet Ryan R. Wick
Louise M. Judd
Kathryn E. Holt
author_sort Ryan R. Wick
title Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_short Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_full Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_fullStr Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_full_unstemmed Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_sort performance of neural network basecalling tools for oxford nanopore sequencing
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-06-01
description Abstract Background Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.
topic Oxford Nanopore
Basecalling
Long-read sequencing
url http://link.springer.com/article/10.1186/s13059-019-1727-y
work_keys_str_mv AT ryanrwick performanceofneuralnetworkbasecallingtoolsforoxfordnanoporesequencing
AT louisemjudd performanceofneuralnetworkbasecallingtoolsforoxfordnanoporesequencing
AT kathryneholt performanceofneuralnetworkbasecallingtoolsforoxfordnanoporesequencing
_version_ 1724719287871995904