Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data

Background: RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group compa...

Full description

Bibliographic Details
Main Authors: Kadota, K. (Author), Osabe, T. (Author), Shimizu, K. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
RNA
Online Access:View Fulltext in Publisher
LEADER 03693nam a2200577Ia 4500
001 10.1186-s12859-021-04438-4
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04438-4 
520 3 |a Background: RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report that a model-based clustering algorithm implemented in an R package, MBCluster.Seq, can also be used for DE analysis. Results: The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG (PDEG) was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm. Conclusions: MBCdeg with DEGES normalization can be used in the identification of DEGs when the PDEG is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required. © 2021, The Author(s). 
650 0 4 |a algorithm 
650 0 4 |a Algorithms 
650 0 4 |a article 
650 0 4 |a cluster analysis 
650 0 4 |a Cluster Analysis 
650 0 4 |a clustering algorithm 
650 0 4 |a Clustering algorithms 
650 0 4 |a controlled study 
650 0 4 |a Differential expression 
650 0 4 |a Differential expression 
650 0 4 |a differential expression analysis 
650 0 4 |a Differentially expressed gene 
650 0 4 |a Expression analysis 
650 0 4 |a Expression patterns 
650 0 4 |a Gene clustering 
650 0 4 |a Gene clustering 
650 0 4 |a Gene expression 
650 0 4 |a gene expression profiling 
650 0 4 |a Gene Expression Profiling 
650 0 4 |a gene identification 
650 0 4 |a Multi-group 
650 0 4 |a Normalization algorithms 
650 0 4 |a Posterior probability 
650 0 4 |a Posterior probability 
650 0 4 |a probability 
650 0 4 |a protein expression 
650 0 4 |a RNA 
650 0 4 |a RNA sequencing 
650 0 4 |a RNA-seq 
650 0 4 |a RNA-seq 
650 0 4 |a RNA-Seq 
650 0 4 |a sequence analysis 
650 0 4 |a Sequence Analysis, RNA 
650 0 4 |a simulation 
650 0 4 |a Time course 
700 1 |a Kadota, K.  |e author 
700 1 |a Osabe, T.  |e author 
700 1 |a Shimizu, K.  |e author 
773 |t BMC Bioinformatics