Summary: | 碩士 === 國立臺灣大學 === 流行病學研究所 === 91 === Abstract
There are two types of measurement errors in linkage-analysis data, pedigree error and typing error. In this study, we focus on the development of a method handing random typing error for affected sib-pair data. The frequency of typing errors is usually in the range of 0.5%~3%. Even so low it may still cause serious effects on gene mapping such as overestimation of recombination fraction, misleading order, biased estimation of disequilibrium measures and interference coefficient. In general, there are two ways to deal with errors, the error detection method and error modeling method. Once someone who may contain errors is detected, the error detection method will retype or remove the data then go on with analysis. Comparatively, the error modeling method incurs more complicated calculation due to the involvement of errors in analysis. This study proposes to adjust mean test with typing errors by posterior weights. Because of errors, the observed IBD distribution will not be necessarily equal to the true IBD distribution. When the number of sib-pairs is fixed, there could have a lot of combinations of IBD distributions. Each one is possible to be the true IBD distribution. Assume the corresponding frequencies of IBD genes are distributed as a multinomial distribution and the multinomial parameters can be modeling by a Dirichlet prior, then the posterior distribution for IBD genes can be constructed following the Bayesian method. In practice, it is hard to consider all IBD distributions that are possible to be the true IBD distribution. When total number of sib-pair is fixed, each combination of the IBD gene frequencies can be transferred to an expected IBD distribution and the corresponding statistic is obtained. Take posterior mean of the mean test statistic for adjusting typing errors. By sieving method, we select a 0.025 increment for building up the frequencies of five IBD genes, overall there are 415 combinations. Deleting those combinations that are not satisfied the sum of probabilities equating to 1 and not completely larger than zero, it still be 78729 combinations. Use posterior distribution of the IBD gene frequencies to weight each possible mean test statistic, the new statistic for adjusting typing errors is formed. Owing to huge computation and time limitation, simulation studies of the power of the new statistic needs to be investigated in future. In this thesis, we just introduce the simulation procedure and demonstrate how to calculate the Zstd and Za.
|