The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data
碩士 === 國立臺灣大學 === 農藝學研究所 === 91 === The inference of GLM (generalized linear models) is based on the properties of maximum likelihood estimate, such as asymptotic normality and asymptotic variance covariance matrix. However, the sample size of real life data seldom are large. Therefore th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2003
|
Online Access: | http://ndltd.ncl.edu.tw/handle/40240222604227989189 |
id |
ndltd-TW-091NTU00417025 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-091NTU004170252016-06-20T04:15:45Z http://ndltd.ncl.edu.tw/handle/40240222604227989189 The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data 小樣本過度分散計數資料之信賴區間覆蓋率的研究 Chen, Wei-Ting 陳威廷 碩士 國立臺灣大學 農藝學研究所 91 The inference of GLM (generalized linear models) is based on the properties of maximum likelihood estimate, such as asymptotic normality and asymptotic variance covariance matrix. However, the sample size of real life data seldom are large. Therefore the creditability of type I error rate in hypothesis testing and coverage rate in confidence interval estimation is worth investigating. Lewis (1998) discovered that empirical coverage rate is very close to the nominal confidence level, namely, 95%, even when the sample size is as small as 8. In her study, an important factor was totally ignored, that is the overdispersion in binomial and poisson data. According to Nelder and Welderburn (1989), overdispersion in binomial and poisson data is so common and should be considered as a norm instead of exception. In this study, we exam the effect of overdispersion on the coverage rate of confidence interval for samples with sample sizes of 8, 16 and 32. As our simulation study shows that the impact of overdispersion is profound. Even with the slight dispersion, it requires 32 or more observations instead of 8, to achieve the nominal confidence level 95%. With mild to grave overdispersion the requirement of sample size is far more than 32. An important discovery of our study is that when dispersion is slight to sub-mild, the coverage rate can be remarkably improved by using t statistic instead of z statistic. For the slight overdispersion case and sample size is 8 the coverage rate does reach the 95% confidence level by replacing z statistic by t statistic. The use of t statistic is justified by the fact that the standardized deviance residual has an approximate standard normal distribution. As aforementioned when model fitting is on the right track, namely, correct link function and correct model function are used, overdispersion does have a disadvantageous impact by lowering the coverage rate of confidence interval. However, as revealed by our study that overdispersion plays a more sophisticated role than we expect. When things go wrong in model fitting, that is, either using the wrong link function or using the wrong model function or both, it usually ends with the loss of coverage rate. Main reason of lowing coverage rate might be caused by the biased predicted values and the subsequent biased confidence intervals. In this case, overdispersion plays an advantageous role by somewhat raising the coverage rate through widening the width of confidence interval. Our concluding remarks are: Always consider an overdispersed model while process binomial and poisson data. Using t statistic instead z statistic in constructing a confidence interval while the overdispersion is slight to sub-mild and sample size is in the range 8 to 16. For sample size as large as 32 or larger, the resulting confidence interval has reasonably good confidence level when overdispersion ranges from slight to mild. For grave overdispersion cases, the confidence level of the estimated interval is in doubt without a sample of size larger than 32. Pong, Yun-Ming 彭雲明 2003 學位論文 ; thesis 81 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 農藝學研究所 === 91 === The inference of GLM (generalized linear models) is based on the properties of maximum likelihood estimate, such as asymptotic normality and asymptotic variance covariance matrix. However, the sample size of real life data seldom are large. Therefore the creditability of type I error rate in hypothesis testing and coverage rate in confidence interval estimation is worth investigating. Lewis (1998) discovered that empirical coverage rate is very close to the nominal confidence level, namely, 95%, even when the sample size is as small as 8. In her study, an important factor was totally ignored, that is the overdispersion in binomial and poisson data. According to Nelder and Welderburn (1989), overdispersion in binomial and poisson data is so common and should be considered as a norm instead of exception.
In this study, we exam the effect of overdispersion on the coverage rate of confidence interval for samples with sample sizes of 8, 16 and 32. As our simulation study shows that the impact of overdispersion is profound. Even with the slight dispersion, it requires 32 or more observations instead of 8, to achieve the nominal confidence level 95%. With mild to grave overdispersion the requirement of sample size is far more than 32. An important discovery of our study is that when dispersion is slight to sub-mild, the coverage rate can be remarkably improved by using t statistic instead of z statistic. For the slight overdispersion case and sample size is 8 the coverage rate does reach the 95% confidence level by replacing z statistic by t statistic. The use of t statistic is justified by the fact that the standardized deviance residual has an approximate standard normal distribution.
As aforementioned when model fitting is on the right track, namely, correct link function and correct model function are used, overdispersion does have a disadvantageous impact by lowering the coverage rate of confidence interval. However, as revealed by our study that overdispersion plays a more sophisticated role than we expect. When things go wrong in model fitting, that is, either using the wrong link function or using the wrong model function or both, it usually ends with the loss of coverage rate. Main reason of lowing coverage rate might be caused by the biased predicted values and the subsequent biased confidence intervals. In this case, overdispersion plays an advantageous role by somewhat raising the coverage rate through widening the width of confidence interval.
Our concluding remarks are: Always consider an overdispersed model while process binomial and poisson data. Using t statistic instead z statistic in constructing a confidence interval while the overdispersion is slight to sub-mild and sample size is in the range 8 to 16. For sample size as large as 32 or larger, the resulting confidence interval has reasonably good confidence level when overdispersion ranges from slight to mild. For grave overdispersion cases, the confidence level of the estimated interval is in doubt without a sample of size larger than 32.
|
author2 |
Pong, Yun-Ming |
author_facet |
Pong, Yun-Ming Chen, Wei-Ting 陳威廷 |
author |
Chen, Wei-Ting 陳威廷 |
spellingShingle |
Chen, Wei-Ting 陳威廷 The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data |
author_sort |
Chen, Wei-Ting |
title |
The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data |
title_short |
The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data |
title_full |
The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data |
title_fullStr |
The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data |
title_full_unstemmed |
The Study of Confidence Interval Coverage for Small Samples of Overdispersed Count Data |
title_sort |
study of confidence interval coverage for small samples of overdispersed count data |
publishDate |
2003 |
url |
http://ndltd.ncl.edu.tw/handle/40240222604227989189 |
work_keys_str_mv |
AT chenweiting thestudyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata AT chénwēitíng thestudyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata AT chenweiting xiǎoyàngběnguòdùfēnsànjìshùzīliàozhīxìnlàiqūjiānfùgàilǜdeyánjiū AT chénwēitíng xiǎoyàngběnguòdùfēnsànjìshùzīliàozhīxìnlàiqūjiānfùgàilǜdeyánjiū AT chenweiting studyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata AT chénwēitíng studyofconfidenceintervalcoverageforsmallsamplesofoverdispersedcountdata |
_version_ |
1718310473180381184 |