Semiparametric Analysis of Randomized Data with Missing Covariates in Logistic Regression

博士 === 東海大學 === 統計學系 === 97 === Randomized response is an interview technique designed to eliminate response bias when sensitive questions are asked. In this dissertation, we present a logistic regression model on randomized response data when the covariates on some subjects are missing at random. F...

Full description

Bibliographic Details
Main Authors: Shu-Hui Hsieh, 謝淑惠
Other Authors: Shen-Ming Lee
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/51048187299014857685
Description
Summary:博士 === 東海大學 === 統計學系 === 97 === Randomized response is an interview technique designed to eliminate response bias when sensitive questions are asked. In this dissertation, we present a logistic regression model on randomized response data when the covariates on some subjects are missing at random. First, based on validation data set, we propose Horvitz and Thompson (1952)-type weighted estimators. In particular, we investigate the Horvitz and Thompson (1952)-type weighted estimators by using different estimates of the selection probabilities. We present large sample theory for the proposed estimators and show that they are more efficient than the estimator using the true selection probabilities. Simulation results support theoretical analysis. Under the assumption that the observed covariate and surrogated variable are categorical, using the empirical average estimator for the selection probabilities, we shall show that both augmented inverse probability weighted estimator (AIPW) and mean score estimator reduce to weighted estimators. Although these estimating equations are different, they lead numerically to exactly the same root. Second, based on validation and non-validation data set, two semiparametric approaches are developed for analyzing randomized response data with missing covariates in logistic regression model. One of the two estimates is an extension of the validation likelihood estimator of Breslow and Cain (1988). The other is a joint conditional likelihood estimator based on both validation and non-validation data set. We present large sample theory for the proposed estimators. Simulation results show that the joint conditional likelihood estimator is more efficient than the validation likelihood estimator, weighted estimator, complete-case estimator and partial likelihood estimator. We also illustrate these methods using data from a cable TV study.