An effective method to resolve ambiguous bisulfite-treated reads

Background: The combination of the bisulfite treatment and the next-generation sequencing is an important method for methylation analysis, and aligning the bisulfite-treated reads (BS-reads) is the critical step for the downstream applications. As bisulfite treatment reduces the complexity of the se...

Full description

Bibliographic Details
Main Authors: Liu, M. (Author), Xu, Y. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
DNA
Online Access:View Fulltext in Publisher
LEADER 03000nam a2200481Ia 4500
001 10.1186-s12859-021-04204-6
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a An effective method to resolve ambiguous bisulfite-treated reads 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04204-6 
520 3 |a Background: The combination of the bisulfite treatment and the next-generation sequencing is an important method for methylation analysis, and aligning the bisulfite-treated reads (BS-reads) is the critical step for the downstream applications. As bisulfite treatment reduces the complexity of the sequences, a large portion of BS-reads might be aligned to multiple locations of the reference genome ambiguously, called multireads. These multireads cannot be employed in the downstream applications since they are likely to introduce artifacts. To identify the best mapping location of each multiread, existing Bayesian-based methods calculate the probability of the read at each position by considering how does it overlap with unique mapped reads. However, ∼ 25 % of multireads are not overlapped with any unique reads, which are unresolvable for existing method. Results: Here we propose a novel method (EM-MUL) that not only rescues multireads overlapped with unique reads, but also uses the overall coverage and accurate base-level alignment to resolve multireads that cannot be handled by current methods. We benchmark our method on both simulated datasets and real datasets. Experimental results show that it is able to align more than 80% of multireads to the best mapping position with very high accuracy. Conclusions: EM-MUL is an effective method designed to accurately determine the best mapping position of multireads in BS-reads. For the downstream applications, it is useful to improve the methylation resolution on the repetitive regions of genome. EM-MUL is free available at https://github.com/lmylynn/EM-MUL. © 2021, The Author(s). 
650 0 4 |a Alkylation 
650 0 4 |a Bayes theorem 
650 0 4 |a Bayes Theorem 
650 0 4 |a Bayesian 
650 0 4 |a Bisulfite 
650 0 4 |a Critical steps 
650 0 4 |a DNA 
650 0 4 |a DNA methylation 
650 0 4 |a DNA Methylation 
650 0 4 |a DNA sequence 
650 0 4 |a Downstream applications 
650 0 4 |a high throughput sequencing 
650 0 4 |a High-accuracy 
650 0 4 |a High-Throughput Nucleotide Sequencing 
650 0 4 |a hydrogen sulfite 
650 0 4 |a Mapping 
650 0 4 |a Methylation 
650 0 4 |a Methylation 
650 0 4 |a Methylation analysis 
650 0 4 |a Multireads 
650 0 4 |a Next-generation sequencing 
650 0 4 |a Real data sets 
650 0 4 |a Sequence Analysis, DNA 
650 0 4 |a Simulated datasets 
650 0 4 |a software 
650 0 4 |a Software 
650 0 4 |a sulfite 
650 0 4 |a Sulfites 
700 1 |a Liu, M.  |e author 
700 1 |a Xu, Y.  |e author 
773 |t BMC Bioinformatics