Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis

Background: Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing o...

Full description

Bibliographic Details
Main Authors: Luna de Haro, A. (Author), Pluvinet, R. (Author), Sanchez Herrero, J.F (Author), Sumoy, L. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
RNA
Online Access:View Fulltext in Publisher
LEADER 02957nam a2200457Ia 4500
001 10.1186-s12859-021-04128-1
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04128-1 
520 3 |a Background: Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing of internal bases, resulting in base differences relative to the template DNA sequence. We hypothesized that some component of the isomiR variation reported so far could be due to systematic technical noise and not real. Results: We have developed the XICRA pipeline to analyze small RNA sequencing data at the isomiR level. We exploited its ability to use single or merged reads to compare isomiR results derived from paired-end (PE) reads with those from single reads (SR) to address whether detectable sequence differences relative to canonical miRNAs found in isomiRs are true biological variations or the result of errors in sequencing. We have detected non-negligible systematic differences between SR and PE data which primarily affect putative internally edited isomiRs, and at a much smaller frequency terminal length changing isomiRs. This is relevant for the identification of true isomiRs in small RNA sequencing datasets. Conclusions: We conclude that potential artifacts derived from sequencing errors and/or data processing could result in an overestimation of abundance and diversity of miRNA isoforms. Efforts in annotating the isomiRnome should take this into account. © 2021, The Author(s). 
650 0 4 |a Biological variation 
650 0 4 |a data analysis 
650 0 4 |a Data Analysis 
650 0 4 |a Data handling 
650 0 4 |a genetics 
650 0 4 |a high throughput sequencing 
650 0 4 |a High-Throughput Nucleotide Sequencing 
650 0 4 |a IsomiR 
650 0 4 |a Length variants 
650 0 4 |a microRNA 
650 0 4 |a MicroRNAs 
650 0 4 |a miRNA 
650 0 4 |a Next-generation sequencing 
650 0 4 |a Paired-end sequencing 
650 0 4 |a RNA 
650 0 4 |a sequence analysis 
650 0 4 |a Sequence Analysis, RNA 
650 0 4 |a Sequencing errors 
650 0 4 |a Small RNA 
650 0 4 |a Technical noise 
650 0 4 |a Templated 
650 0 4 |a Terminal lengths 
650 0 4 |a whole exome sequencing 
650 0 4 |a Whole Exome Sequencing 
700 1 |a Luna de Haro, A.  |e author 
700 1 |a Pluvinet, R.  |e author 
700 1 |a Sanchez Herrero, J.F.  |e author 
700 1 |a Sumoy, L.  |e author 
773 |t BMC Bioinformatics