Differential expression analysis for sequence count data via mixtures of negative binomials

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these dat...

Full description

Bibliographic Details
Main Author:	Bonafede, Elisabetta <1987>
Other Authors:	Viroli, Cinzia
Format:	Doctoral Thesis
Language:	en
Published:	Alma Mater Studiorum - Università di Bologna 2015
Subjects:	SECS-S/01 Statistica
Online Access:	http://amsdottorato.unibo.it/6741/

id	ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-6741
record_format	oai_dc
spelling	ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-67412015-02-14T04:50:16Z Differential expression analysis for sequence count data via mixtures of negative binomials Bonafede, Elisabetta <1987> SECS-S/01 Statistica The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data. Alma Mater Studiorum - Università di Bologna Viroli, Cinzia Robin, Stéphane 2015-02-02 Doctoral Thesis PeerReviewed application/pdf en http://amsdottorato.unibo.it/6741/ info:eu-repo/semantics/openAccess
collection	NDLTD
language	en
format	Doctoral Thesis
sources	NDLTD
topic	SECS-S/01 Statistica
spellingShingle	SECS-S/01 Statistica Bonafede, Elisabetta <1987> Differential expression analysis for sequence count data via mixtures of negative binomials
description	The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
author2	Viroli, Cinzia
author_facet	Viroli, Cinzia Bonafede, Elisabetta <1987>
author	Bonafede, Elisabetta <1987>
author_sort	Bonafede, Elisabetta <1987>
title	Differential expression analysis for sequence count data via mixtures of negative binomials
title_short	Differential expression analysis for sequence count data via mixtures of negative binomials
title_full	Differential expression analysis for sequence count data via mixtures of negative binomials
title_fullStr	Differential expression analysis for sequence count data via mixtures of negative binomials
title_full_unstemmed	Differential expression analysis for sequence count data via mixtures of negative binomials
title_sort	differential expression analysis for sequence count data via mixtures of negative binomials
publisher	Alma Mater Studiorum - Università di Bologna
publishDate	2015
url	http://amsdottorato.unibo.it/6741/
work_keys_str_mv	AT bonafedeelisabetta1987 differentialexpressionanalysisforsequencecountdataviamixturesofnegativebinomials
_version_	1716730933064761344

Differential expression analysis for sequence count data via mixtures of negative binomials

Similar Items