Quantile regression for zero-inflated outcomes

Zero-inflated outcomes are common in biomedical studies, where the excessive zeros indicate some special but undetectable events. Quantile regression is potentially advantageous in analyzing zero-inflated outcomes due to two reasons. First, compared to parametric models such as the zero-inflated Poi...

Full description

Bibliographic Details
Main Author:	Ling, Wodan
Language:	English
Published:	2019
Subjects:	Biometry Quantile regression Mathematical models Distribution (Probability theory)
Online Access:	https://doi.org/10.7916/d8-rre7-sw52

id	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-rre7-sw52
record_format	oai_dc
collection	NDLTD
language	English
sources	NDLTD
topic	Biometry Quantile regression Mathematical models Distribution (Probability theory)
spellingShingle	Biometry Quantile regression Mathematical models Distribution (Probability theory) Ling, Wodan Quantile regression for zero-inflated outcomes
description	Zero-inflated outcomes are common in biomedical studies, where the excessive zeros indicate some special but undetectable events. Quantile regression is potentially advantageous in analyzing zero-inflated outcomes due to two reasons. First, compared to parametric models such as the zero-inflated Poisson and two-part model, quantile regression gives robust and accurate estimation by avoiding likelihood specification and can capture the tail events and heterogeneity over the outcome distribution. Second, while the mean-based regression may be misinterpreted for a zero-inflated outcome, the interpretation of quantiles is naturally compatible with the underlying process that such an outcome intends to measure. Unfortunately, uncorrected linear quantile regression is not directly applicable because of two reasons. First, the feasibility of estimation and validity of inference of quantile regression require the conditional distribution of outcomes to be absolutely continuous, which is violated due to zero-inflation. Second, direct quantile regression implicitly assumes a constant chance to observe a positive outcome, but the degree of zero-inflation varies with the covariates in most cases. Thus the conditional quantile function of the outcome depends on the covariates in a nonlinear fashion. To analyze the zero-inflated outcomes by taking advantage of the merits of quantile regression, we propose a novel quantile regression framework that can address all the issues above. In the first part of this dissertation, we propose a two-part model that comprises a logistic regression for the probability of being positive, and a linear quantile regression for the positive part with subject-specific zero-inflation adjusted. Inference on the estimated conditional quantile and covariate effect are not trivial based on such a two-part model. We then develop an algorithm to achieve a consistent estimation of the conditional quantiles, while circumventing the unbounded variance at the quantile level where the conditional quantile changes from zero to positive. Furthermore, we develop an inference tool to determine the quantile treatment effect associated with a covariate at a given quantile level. We evaluate the proposed method and compare it with existing approaches by simulation studies and a real data analysis aimed at studying the risk factors for carotid atherosclerosis. In the second part, based on the proposed two-part model mentioned above, we develop ZIQRank, a zero-inflated quantile rank-score based test to detect the difference in distributions. The proposed test extends the local inference in the first part to a simultaneous one. It is powerful to handle zero-inflation and heterogeneity simultaneously. It comprises a valid test of logistic regression for the zero-inflation and rank-score based tests on multiple quantiles for the positive part with zero-inflation adjusted. The p-values are combined with a procedure selected according to the extent of zero-inflation and heterogeneity of the data. Simulation studies show that compared to existing tests, the proposed test has a higher power in detecting differential distributions. Finally, we apply the ZIQRank test to a human scRNA-seq data to study differentially expressed genes in Neoplastic and Regular cells. It successfully discovers a group of crucial genes associated with glioma, while the other methods fail to do so. In the third part, we extend the proposed two-part quantile regression model for zero-inflated outcomes and the ZIQRank test to analyze longitudinal data. Each part of the proposed two-part model is modified as a marginal longitudinal model (GEE), conditioning on the outcome at the previous time point and its zero/positive status. We apply the model and the test to study the effect of a recommender system aimed at boosting user engagement of a suite of smartphone apps designed for depressed patients. Our novel model framework demonstrates a dominating performance in model fitting, prediction, and critical feature detection, compared to the existing methods.
author	Ling, Wodan
author_facet	Ling, Wodan
author_sort	Ling, Wodan
title	Quantile regression for zero-inflated outcomes
title_short	Quantile regression for zero-inflated outcomes
title_full	Quantile regression for zero-inflated outcomes
title_fullStr	Quantile regression for zero-inflated outcomes
title_full_unstemmed	Quantile regression for zero-inflated outcomes
title_sort	quantile regression for zero-inflated outcomes
publishDate	2019
url	https://doi.org/10.7916/d8-rre7-sw52
work_keys_str_mv	AT lingwodan quantileregressionforzeroinflatedoutcomes
_version_	1719269888039583744
spelling	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-rre7-sw522019-10-17T03:18:17ZQuantile regression for zero-inflated outcomesLing, Wodan2019ThesesBiometryQuantile regressionMathematical modelsDistribution (Probability theory)Zero-inflated outcomes are common in biomedical studies, where the excessive zeros indicate some special but undetectable events. Quantile regression is potentially advantageous in analyzing zero-inflated outcomes due to two reasons. First, compared to parametric models such as the zero-inflated Poisson and two-part model, quantile regression gives robust and accurate estimation by avoiding likelihood specification and can capture the tail events and heterogeneity over the outcome distribution. Second, while the mean-based regression may be misinterpreted for a zero-inflated outcome, the interpretation of quantiles is naturally compatible with the underlying process that such an outcome intends to measure. Unfortunately, uncorrected linear quantile regression is not directly applicable because of two reasons. First, the feasibility of estimation and validity of inference of quantile regression require the conditional distribution of outcomes to be absolutely continuous, which is violated due to zero-inflation. Second, direct quantile regression implicitly assumes a constant chance to observe a positive outcome, but the degree of zero-inflation varies with the covariates in most cases. Thus the conditional quantile function of the outcome depends on the covariates in a nonlinear fashion. To analyze the zero-inflated outcomes by taking advantage of the merits of quantile regression, we propose a novel quantile regression framework that can address all the issues above. In the first part of this dissertation, we propose a two-part model that comprises a logistic regression for the probability of being positive, and a linear quantile regression for the positive part with subject-specific zero-inflation adjusted. Inference on the estimated conditional quantile and covariate effect are not trivial based on such a two-part model. We then develop an algorithm to achieve a consistent estimation of the conditional quantiles, while circumventing the unbounded variance at the quantile level where the conditional quantile changes from zero to positive. Furthermore, we develop an inference tool to determine the quantile treatment effect associated with a covariate at a given quantile level. We evaluate the proposed method and compare it with existing approaches by simulation studies and a real data analysis aimed at studying the risk factors for carotid atherosclerosis. In the second part, based on the proposed two-part model mentioned above, we develop ZIQRank, a zero-inflated quantile rank-score based test to detect the difference in distributions. The proposed test extends the local inference in the first part to a simultaneous one. It is powerful to handle zero-inflation and heterogeneity simultaneously. It comprises a valid test of logistic regression for the zero-inflation and rank-score based tests on multiple quantiles for the positive part with zero-inflation adjusted. The p-values are combined with a procedure selected according to the extent of zero-inflation and heterogeneity of the data. Simulation studies show that compared to existing tests, the proposed test has a higher power in detecting differential distributions. Finally, we apply the ZIQRank test to a human scRNA-seq data to study differentially expressed genes in Neoplastic and Regular cells. It successfully discovers a group of crucial genes associated with glioma, while the other methods fail to do so. In the third part, we extend the proposed two-part quantile regression model for zero-inflated outcomes and the ZIQRank test to analyze longitudinal data. Each part of the proposed two-part model is modified as a marginal longitudinal model (GEE), conditioning on the outcome at the previous time point and its zero/positive status. We apply the model and the test to study the effect of a recommender system aimed at boosting user engagement of a suite of smartphone apps designed for depressed patients. Our novel model framework demonstrates a dominating performance in model fitting, prediction, and critical feature detection, compared to the existing methods.Englishhttps://doi.org/10.7916/d8-rre7-sw52

Quantile regression for zero-inflated outcomes

Similar Items