The statistical analysis of complex sampling data

>Magister Scientiae - MSc === Most standard statistical techniques illustrated in text books assume that the data are collected from a simple random sample (SRS) and hence are independently and identically distributed (i.i.d.). In reality, data are often sourced through complex sampling (CS) desi...

Full description

Bibliographic Details
Main Author:	Paulse, Bradley
Other Authors:	Luus, Retha
Language:	en
Published:	University of the Western Cape 2019
Subjects:	Complex sampling Inference Weighting Survey data Resampling
Online Access:	http://hdl.handle.net/11394/6754

id	ndltd-netd.ac.za-oai-union.ndltd.org-uwc-oai-etd.uwc.ac.za-11394-6754
record_format	oai_dc
spelling	ndltd-netd.ac.za-oai-union.ndltd.org-uwc-oai-etd.uwc.ac.za-11394-67542019-07-19T03:12:45Z The statistical analysis of complex sampling data Paulse, Bradley Luus, Retha Blignaut, Rénette Complex sampling Inference Weighting Survey data Resampling >Magister Scientiae - MSc Most standard statistical techniques illustrated in text books assume that the data are collected from a simple random sample (SRS) and hence are independently and identically distributed (i.i.d.). In reality, data are often sourced through complex sampling (CS) designs, with a combination of stratification and clustering at different levels of the design. Consequently, the CS data are not i.i.d. and sampling weights that are developed over different stages, are calculated and included in the analysis of this data to account for the sampling design. Logistic regression is often employed in the modelling of survey data since the response under investigation typically has a dichotomous outcome. Furthermore, since the logistic regression model has no homogeneity or normality assumptions, it is appealing when modelling a dichotomous response from survey data. This research considers the comparison of the estimates of the logistic regression model parameters when the CS design is accounted for, i.e. weighting is present, to when the data are modelled using an SRS design, i.e. no weighting. In addition, the standard errors of the estimators will be obtained using three different variance techniques, viz. Taylor series linearization, the jackknife and the bootstrap. The different estimated standard errors will be used in the calculation of the standard (asymptotic) interval which will be compared to the bootstrap percentile interval in terms of the interval coverage probability. A further level of comparison is obtained when using only design weights to those obtained using calibrated and integrated sampling weights. This simulation study is based on the Income and Expenditure Survey (IES) of 2005/2006. The results showed that generally when weighting was used the estimators performed better as opposed to when the design was ignored, i.e. under the assumption of SRS, with the results for the Taylor series linearization being more stable. 2019-05-09T09:08:51Z 2019-05-09T09:08:51Z 2018 http://hdl.handle.net/11394/6754 en University of the Western Cape University of the Western Cape
collection	NDLTD
language	en
sources	NDLTD
topic	Complex sampling Inference Weighting Survey data Resampling
spellingShingle	Complex sampling Inference Weighting Survey data Resampling Paulse, Bradley The statistical analysis of complex sampling data
description	>Magister Scientiae - MSc === Most standard statistical techniques illustrated in text books assume that the data are collected from a simple random sample (SRS) and hence are independently and identically distributed (i.i.d.). In reality, data are often sourced through complex sampling (CS) designs, with a combination of stratification and clustering at different levels of the design. Consequently, the CS data are not i.i.d. and sampling weights that are developed over different stages, are calculated and included in the analysis of this data to account for the sampling design. Logistic regression is often employed in the modelling of survey data since the response under investigation typically has a dichotomous outcome. Furthermore, since the logistic regression model has no homogeneity or normality assumptions, it is appealing when modelling a dichotomous response from survey data. This research considers the comparison of the estimates of the logistic regression model parameters when the CS design is accounted for, i.e. weighting is present, to when the data are modelled using an SRS design, i.e. no weighting. In addition, the standard errors of the estimators will be obtained using three different variance techniques, viz. Taylor series linearization, the jackknife and the bootstrap. The different estimated standard errors will be used in the calculation of the standard (asymptotic) interval which will be compared to the bootstrap percentile interval in terms of the interval coverage probability. A further level of comparison is obtained when using only design weights to those obtained using calibrated and integrated sampling weights. This simulation study is based on the Income and Expenditure Survey (IES) of 2005/2006. The results showed that generally when weighting was used the estimators performed better as opposed to when the design was ignored, i.e. under the assumption of SRS, with the results for the Taylor series linearization being more stable.
author2	Luus, Retha
author_facet	Luus, Retha Paulse, Bradley
author	Paulse, Bradley
author_sort	Paulse, Bradley
title	The statistical analysis of complex sampling data
title_short	The statistical analysis of complex sampling data
title_full	The statistical analysis of complex sampling data
title_fullStr	The statistical analysis of complex sampling data
title_full_unstemmed	The statistical analysis of complex sampling data
title_sort	statistical analysis of complex sampling data
publisher	University of the Western Cape
publishDate	2019
url	http://hdl.handle.net/11394/6754
work_keys_str_mv	AT paulsebradley thestatisticalanalysisofcomplexsamplingdata AT paulsebradley statisticalanalysisofcomplexsamplingdata
_version_	1719228787429736448

The statistical analysis of complex sampling data

Similar Items