Imputation of missing genotypes: an empirical evaluation of IMPUTE

<p>Abstract</p> <p>Background</p> <p>Imputation of missing genotypes is becoming a very popular solution for synchronizing genotype data collected with different microarray platforms but the effect of ethnic background, subject ascertainment, and amount of missing data...

Full description

Bibliographic Details
Main Authors: Steinberg Martin H, Perls Thomas T, Fucharoen Supan, Chui David HK, Hartley Stephen W, Timofeev Nadia, Zhao Zhenming, Baldwin Clinton T, Sebastiani Paola
Format: Article
Language:English
Published: BMC 2008-12-01
Series:BMC Genetics
Online Access:http://www.biomedcentral.com/1471-2156/9/85
id doaj-2cc025c7cb764e5aa1074f8a25e7f080
record_format Article
spelling doaj-2cc025c7cb764e5aa1074f8a25e7f0802020-11-25T02:58:04ZengBMCBMC Genetics1471-21562008-12-01918510.1186/1471-2156-9-85Imputation of missing genotypes: an empirical evaluation of IMPUTESteinberg Martin HPerls Thomas TFucharoen SupanChui David HKHartley Stephen WTimofeev NadiaZhao ZhenmingBaldwin Clinton TSebastiani Paola<p>Abstract</p> <p>Background</p> <p>Imputation of missing genotypes is becoming a very popular solution for synchronizing genotype data collected with different microarray platforms but the effect of ethnic background, subject ascertainment, and amount of missing data on the accuracy of imputation are not well understood.</p> <p>Results</p> <p>We evaluated the accuracy of the program IMPUTE to generate the genotype data of partially or fully untyped single nucleotide polymorphisms (SNPs). The program uses a model-based approach to imputation that reconstructs the genotype distribution given a set of referent haplotypes and the observed data, and uses this distribution to compute the marginal probability of each missing genotype for each individual subject that is used to impute the missing data. We assembled genome-wide data from five different studies and three different ethnic groups comprising Caucasians, African Americans and Asians. We randomly removed genotype data and then compared the observed genotypes with those generated by IMPUTE. Our analysis shows 97% median accuracy in Caucasian subjects when less than 10% of the SNPs are untyped and missing genotypes are accepted regardless of their posterior probability. The median accuracy increases to 99% when we require 0.95 minimum posterior probability for an imputed genotype to be acceptable. The accuracy decreases to 86% or 94% when subjects are African Americans or Asians. We propose a strategy to improve the accuracy by leveraging the level of admixture in African Americans.</p> <p>Conclusion</p> <p>Our analysis suggests that IMPUTE is very accurate in samples of Caucasians origin, it is slightly less accurate in samples of Asians background, but substantially less accurate in samples of admixed background such as African Americans. Sample size and ascertainment do not seem to affect the accuracy of imputation.</p> http://www.biomedcentral.com/1471-2156/9/85
collection DOAJ
language English
format Article
sources DOAJ
author Steinberg Martin H
Perls Thomas T
Fucharoen Supan
Chui David HK
Hartley Stephen W
Timofeev Nadia
Zhao Zhenming
Baldwin Clinton T
Sebastiani Paola
spellingShingle Steinberg Martin H
Perls Thomas T
Fucharoen Supan
Chui David HK
Hartley Stephen W
Timofeev Nadia
Zhao Zhenming
Baldwin Clinton T
Sebastiani Paola
Imputation of missing genotypes: an empirical evaluation of IMPUTE
BMC Genetics
author_facet Steinberg Martin H
Perls Thomas T
Fucharoen Supan
Chui David HK
Hartley Stephen W
Timofeev Nadia
Zhao Zhenming
Baldwin Clinton T
Sebastiani Paola
author_sort Steinberg Martin H
title Imputation of missing genotypes: an empirical evaluation of IMPUTE
title_short Imputation of missing genotypes: an empirical evaluation of IMPUTE
title_full Imputation of missing genotypes: an empirical evaluation of IMPUTE
title_fullStr Imputation of missing genotypes: an empirical evaluation of IMPUTE
title_full_unstemmed Imputation of missing genotypes: an empirical evaluation of IMPUTE
title_sort imputation of missing genotypes: an empirical evaluation of impute
publisher BMC
series BMC Genetics
issn 1471-2156
publishDate 2008-12-01
description <p>Abstract</p> <p>Background</p> <p>Imputation of missing genotypes is becoming a very popular solution for synchronizing genotype data collected with different microarray platforms but the effect of ethnic background, subject ascertainment, and amount of missing data on the accuracy of imputation are not well understood.</p> <p>Results</p> <p>We evaluated the accuracy of the program IMPUTE to generate the genotype data of partially or fully untyped single nucleotide polymorphisms (SNPs). The program uses a model-based approach to imputation that reconstructs the genotype distribution given a set of referent haplotypes and the observed data, and uses this distribution to compute the marginal probability of each missing genotype for each individual subject that is used to impute the missing data. We assembled genome-wide data from five different studies and three different ethnic groups comprising Caucasians, African Americans and Asians. We randomly removed genotype data and then compared the observed genotypes with those generated by IMPUTE. Our analysis shows 97% median accuracy in Caucasian subjects when less than 10% of the SNPs are untyped and missing genotypes are accepted regardless of their posterior probability. The median accuracy increases to 99% when we require 0.95 minimum posterior probability for an imputed genotype to be acceptable. The accuracy decreases to 86% or 94% when subjects are African Americans or Asians. We propose a strategy to improve the accuracy by leveraging the level of admixture in African Americans.</p> <p>Conclusion</p> <p>Our analysis suggests that IMPUTE is very accurate in samples of Caucasians origin, it is slightly less accurate in samples of Asians background, but substantially less accurate in samples of admixed background such as African Americans. Sample size and ascertainment do not seem to affect the accuracy of imputation.</p>
url http://www.biomedcentral.com/1471-2156/9/85
work_keys_str_mv AT steinbergmartinh imputationofmissinggenotypesanempiricalevaluationofimpute
AT perlsthomast imputationofmissinggenotypesanempiricalevaluationofimpute
AT fucharoensupan imputationofmissinggenotypesanempiricalevaluationofimpute
AT chuidavidhk imputationofmissinggenotypesanempiricalevaluationofimpute
AT hartleystephenw imputationofmissinggenotypesanempiricalevaluationofimpute
AT timofeevnadia imputationofmissinggenotypesanempiricalevaluationofimpute
AT zhaozhenming imputationofmissinggenotypesanempiricalevaluationofimpute
AT baldwinclintont imputationofmissinggenotypesanempiricalevaluationofimpute
AT sebastianipaola imputationofmissinggenotypesanempiricalevaluationofimpute
_version_ 1724708692611301376