A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data

Motivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. T...

Full description

Bibliographic Details
Main Authors: Yolanda Larriba, Cristina Rueda, Miguel A. Fernández, Shyamal D. Peddada
Format: Article
Language:English
Published: Frontiers Media S.A. 2018-02-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/article/10.3389/fgene.2018.00024/full
id doaj-7b20106b87874f379fbc3795099848ae
record_format Article
spelling doaj-7b20106b87874f379fbc3795099848ae2020-11-24T21:37:05ZengFrontiers Media S.A.Frontiers in Genetics1664-80212018-02-01910.3389/fgene.2018.00024316891A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional DataYolanda Larriba0Cristina Rueda1Miguel A. Fernández2Shyamal D. Peddada3Shyamal D. Peddada4Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, SpainDepartamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, SpainDepartamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, SpainBiostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, United StatesDepartment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United StatesMotivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method.Results: In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset.Availability: A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.htmlhttp://journal.frontiersin.org/article/10.3389/fgene.2018.00024/fullrhythmicityhigh-throughput technologiesnormalizationoscillatory systemscircadian genes
collection DOAJ
language English
format Article
sources DOAJ
author Yolanda Larriba
Cristina Rueda
Miguel A. Fernández
Shyamal D. Peddada
Shyamal D. Peddada
spellingShingle Yolanda Larriba
Cristina Rueda
Miguel A. Fernández
Shyamal D. Peddada
Shyamal D. Peddada
A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data
Frontiers in Genetics
rhythmicity
high-throughput technologies
normalization
oscillatory systems
circadian genes
author_facet Yolanda Larriba
Cristina Rueda
Miguel A. Fernández
Shyamal D. Peddada
Shyamal D. Peddada
author_sort Yolanda Larriba
title A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data
title_short A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data
title_full A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data
title_fullStr A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data
title_full_unstemmed A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data
title_sort bootstrap based measure robust to the choice of normalization methods for detecting rhythmic features in high dimensional data
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2018-02-01
description Motivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method.Results: In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset.Availability: A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.html
topic rhythmicity
high-throughput technologies
normalization
oscillatory systems
circadian genes
url http://journal.frontiersin.org/article/10.3389/fgene.2018.00024/full
work_keys_str_mv AT yolandalarriba abootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT cristinarueda abootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT miguelafernandez abootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT shyamaldpeddada abootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT shyamaldpeddada abootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT yolandalarriba bootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT cristinarueda bootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT miguelafernandez bootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT shyamaldpeddada bootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
AT shyamaldpeddada bootstrapbasedmeasurerobusttothechoiceofnormalizationmethodsfordetectingrhythmicfeaturesinhighdimensionaldata
_version_ 1725938354450595840