Systematic Identification of Housekeeping Genes Possibly Used as References in <i>Caenorhabditis elegans</i> by Large-Scale Data Integration

For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normal...

Full description

Bibliographic Details
Main Authors: Jingxin Tao, Youjin Hao, Xudong Li, Huachun Yin, Xiner Nie, Jie Zhang, Boying Xu, Qiao Chen, Bo Li
Format: Article
Language:English
Published: MDPI AG 2020-03-01
Series:Cells
Subjects:
Online Access:https://www.mdpi.com/2073-4409/9/3/786
Description
Summary:For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normalization is limited. In this study, an unbiased identification of reference genes in <i>Caenorhabditis elegans</i> was performed based on 145 microarray datasets (2296 gene array samples) covering different developmental stages, different tissues, drug treatments, lifestyle, and various stresses. As a result, thirteen housekeeping genes (<i>rps-23</i>, <i>rps-26</i>, <i>rps-27</i>, <i>rps-16</i>, <i>rps-2</i>, <i>rps-4</i>, <i>rps-17</i>, <i>rpl-24.1</i>, <i>rpl-27</i>, <i>rpl-33</i>, <i>rpl-36</i>, <i>rpl-35</i>, and <i>rpl-15</i>) with enhanced stability were comprehensively identified by using six popular normalization algorithms and <i>RankAggreg</i> method. Functional enrichment analysis revealed that these genes were significantly overrepresented in GO terms or KEGG pathways related to ribosomes. Validation analysis using recently published datasets revealed that the expressions of newly identified candidate reference genes were more stable than the commonly used reference genes. Based on the results, we recommended using <i>rpl-33</i> and <i>rps-26</i> as the optimal reference genes for microarray and <i>rps-2</i> and <i>rps-4</i> for RNA-sequencing data validation. More importantly, the most stable <i>rps-23</i> should be a promising reference gene for both data types. This study, for the first time, successfully displays a large-scale microarray data driven genome-wide identification of stable reference genes for normalizing gene expression data and provides a potential guideline on the selection of universal internal reference genes in <i>C. elegans</i>, for quantitative gene expression analysis.
ISSN:2073-4409