ERROR CORRECTION METHOD FOR SEQUENCING DATA WITH INSERTIONS AND DELETIONS

Subject of Research.A method for error correction for sequencing reads of a haploid organism with insertions and deletions was developed. It was tested on two libraries: a synthesized dataset for Escherichia coli bacterium and a real dataset of reads for Pseudomonas stutzeri. Method. The method is b...

Full description

Bibliographic Details
Main Authors: A. V. Alexandrov, A. A. Shalyto
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2016-01-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:http://ntv.ifmo.ru/file/article/14578.pdf
Description
Summary:Subject of Research.A method for error correction for sequencing reads of a haploid organism with insertions and deletions was developed. It was tested on two libraries: a synthesized dataset for Escherichia coli bacterium and a real dataset of reads for Pseudomonas stutzeri. Method. The method is based on using k-mers but only for finding reads that are close to each other. For the close reads a consensus string is created which is then used for correcting errors in the initial reads. Main Results. The algorithm is implemented as a separated program. The program has been tested on both real and synthesized data. The method performance is higher than that of the other known methods (N50 metric was used as well as total contig length and maximal contig length as metrics for comparison). Practical Relevance. The method can be used together with known genome assembly methods not suitable for application with the reads containing insertion and deletion errors.
ISSN:2226-1494
2500-0373