The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data

碩士 === 國立臺灣大學 === 農藝學研究所 === 91 === The inference of GLM (generalized linear models) is based on the properties of maximum likelihood estimate, such as asymptotic normality and asymptotic variance covariance matrix. However, the sample size of real life data seldom are large. Therefore th...

Full description

Bibliographic Details
Main Authors: Chen, Wei-Ting, 陳威廷
Other Authors: Pong, Yun-Ming
Format: Others
Language:zh-TW
Published: 2003
Online Access:http://ndltd.ncl.edu.tw/handle/40240222604227989189
id ndltd-TW-091NTU00417025
record_format oai_dc
spelling ndltd-TW-091NTU004170252016-06-20T04:15:45Z http://ndltd.ncl.edu.tw/handle/40240222604227989189 The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data 小樣本過度分散計數資料之信賴區間覆蓋率的研究 Chen, Wei-Ting 陳威廷 碩士 國立臺灣大學 農藝學研究所 91 The inference of GLM (generalized linear models) is based on the properties of maximum likelihood estimate, such as asymptotic normality and asymptotic variance covariance matrix. However, the sample size of real life data seldom are large. Therefore the creditability of type I error rate in hypothesis testing and coverage rate in confidence interval estimation is worth investigating. Lewis (1998) discovered that empirical coverage rate is very close to the nominal confidence level, namely, 95%, even when the sample size is as small as 8. In her study, an important factor was totally ignored, that is the overdispersion in binomial and poisson data. According to Nelder and Welderburn (1989), overdispersion in binomial and poisson data is so common and should be considered as a norm instead of exception. In this study, we exam the effect of overdispersion on the coverage rate of confidence interval for samples with sample sizes of 8, 16 and 32. As our simulation study shows that the impact of overdispersion is profound. Even with the slight dispersion, it requires 32 or more observations instead of 8, to achieve the nominal confidence level 95%. With mild to grave overdispersion the requirement of sample size is far more than 32. An important discovery of our study is that when dispersion is slight to sub-mild, the coverage rate can be remarkably improved by using t statistic instead of z statistic. For the slight overdispersion case and sample size is 8 the coverage rate does reach the 95% confidence level by replacing z statistic by t statistic. The use of t statistic is justified by the fact that the standardized deviance residual has an approximate standard normal distribution. As aforementioned when model fitting is on the right track, namely, correct link function and correct model function are used, overdispersion does have a disadvantageous impact by lowering the coverage rate of confidence interval. However, as revealed by our study that overdispersion plays a more sophisticated role than we expect. When things go wrong in model fitting, that is, either using the wrong link function or using the wrong model function or both, it usually ends with the loss of coverage rate. Main reason of lowing coverage rate might be caused by the biased predicted values and the subsequent biased confidence intervals. In this case, overdispersion plays an advantageous role by somewhat raising the coverage rate through widening the width of confidence interval. Our concluding remarks are: Always consider an overdispersed model while process binomial and poisson data. Using t statistic instead z statistic in constructing a confidence interval while the overdispersion is slight to sub-mild and sample size is in the range 8 to 16. For sample size as large as 32 or larger, the resulting confidence interval has reasonably good confidence level when overdispersion ranges from slight to mild. For grave overdispersion cases, the confidence level of the estimated interval is in doubt without a sample of size larger than 32. Pong, Yun-Ming 彭雲明 2003 學位論文 ; thesis 81 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 農藝學研究所 === 91 === The inference of GLM (generalized linear models) is based on the properties of maximum likelihood estimate, such as asymptotic normality and asymptotic variance covariance matrix. However, the sample size of real life data seldom are large. Therefore the creditability of type I error rate in hypothesis testing and coverage rate in confidence interval estimation is worth investigating. Lewis (1998) discovered that empirical coverage rate is very close to the nominal confidence level, namely, 95%, even when the sample size is as small as 8. In her study, an important factor was totally ignored, that is the overdispersion in binomial and poisson data. According to Nelder and Welderburn (1989), overdispersion in binomial and poisson data is so common and should be considered as a norm instead of exception. In this study, we exam the effect of overdispersion on the coverage rate of confidence interval for samples with sample sizes of 8, 16 and 32. As our simulation study shows that the impact of overdispersion is profound. Even with the slight dispersion, it requires 32 or more observations instead of 8, to achieve the nominal confidence level 95%. With mild to grave overdispersion the requirement of sample size is far more than 32. An important discovery of our study is that when dispersion is slight to sub-mild, the coverage rate can be remarkably improved by using t statistic instead of z statistic. For the slight overdispersion case and sample size is 8 the coverage rate does reach the 95% confidence level by replacing z statistic by t statistic. The use of t statistic is justified by the fact that the standardized deviance residual has an approximate standard normal distribution. As aforementioned when model fitting is on the right track, namely, correct link function and correct model function are used, overdispersion does have a disadvantageous impact by lowering the coverage rate of confidence interval. However, as revealed by our study that overdispersion plays a more sophisticated role than we expect. When things go wrong in model fitting, that is, either using the wrong link function or using the wrong model function or both, it usually ends with the loss of coverage rate. Main reason of lowing coverage rate might be caused by the biased predicted values and the subsequent biased confidence intervals. In this case, overdispersion plays an advantageous role by somewhat raising the coverage rate through widening the width of confidence interval. Our concluding remarks are: Always consider an overdispersed model while process binomial and poisson data. Using t statistic instead z statistic in constructing a confidence interval while the overdispersion is slight to sub-mild and sample size is in the range 8 to 16. For sample size as large as 32 or larger, the resulting confidence interval has reasonably good confidence level when overdispersion ranges from slight to mild. For grave overdispersion cases, the confidence level of the estimated interval is in doubt without a sample of size larger than 32.
author2 Pong, Yun-Ming
author_facet Pong, Yun-Ming
Chen, Wei-Ting
陳威廷
author Chen, Wei-Ting
陳威廷
spellingShingle Chen, Wei-Ting
陳威廷
The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
author_sort Chen, Wei-Ting
title The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
title_short The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
title_full The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
title_fullStr The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
title_full_unstemmed The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
title_sort study of confidence interval coverage for small samples of overdispersed count data
publishDate 2003
url http://ndltd.ncl.edu.tw/handle/40240222604227989189
work_keys_str_mv AT chenweiting thestudyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata
AT chénwēitíng thestudyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata
AT chenweiting xiǎoyàngběnguòdùfēnsànjìshùzīliàozhīxìnlàiqūjiānfùgàilǜdeyánjiū
AT chénwēitíng xiǎoyàngběnguòdùfēnsànjìshùzīliàozhīxìnlàiqūjiānfùgàilǜdeyánjiū
AT chenweiting studyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata
AT chénwēitíng studyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata
_version_ 1718310473180381184