A new statistic to evaluate imputation reliability.

<h4>Background</h4>As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with lo...

Full description

Bibliographic Details
Main Authors: Peng Lin, Sarah M Hartz, Zhehao Zhang, Scott F Saccone, Jia Wang, Jay A Tischfield, Howard J Edenberg, John R Kramer, Alison M Goate, Laura J Bierut, John P Rice, COGA Collaborators COGEND Collaborators, GENEVA
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-03-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20300623/?tool=EBI
id doaj-83e2f397fb844d3db9a032c559805d02
record_format Article
spelling doaj-83e2f397fb844d3db9a032c559805d022021-03-04T02:31:33ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-03-0153e969710.1371/journal.pone.0009697A new statistic to evaluate imputation reliability.Peng LinSarah M HartzZhehao ZhangScott F SacconeJia WangJay A TischfieldHoward J EdenbergJohn R KramerAlison M GoateLaura J BierutJohn P RiceCOGA Collaborators COGEND Collaborators, GENEVA<h4>Background</h4>As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems.<h4>Methodology/principal findings</h4>We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (lambda = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms.<h4>Conclusions/significance</h4>IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20300623/?tool=EBI
collection DOAJ
language English
format Article
sources DOAJ
author Peng Lin
Sarah M Hartz
Zhehao Zhang
Scott F Saccone
Jia Wang
Jay A Tischfield
Howard J Edenberg
John R Kramer
Alison M Goate
Laura J Bierut
John P Rice
COGA Collaborators COGEND Collaborators, GENEVA
spellingShingle Peng Lin
Sarah M Hartz
Zhehao Zhang
Scott F Saccone
Jia Wang
Jay A Tischfield
Howard J Edenberg
John R Kramer
Alison M Goate
Laura J Bierut
John P Rice
COGA Collaborators COGEND Collaborators, GENEVA
A new statistic to evaluate imputation reliability.
PLoS ONE
author_facet Peng Lin
Sarah M Hartz
Zhehao Zhang
Scott F Saccone
Jia Wang
Jay A Tischfield
Howard J Edenberg
John R Kramer
Alison M Goate
Laura J Bierut
John P Rice
COGA Collaborators COGEND Collaborators, GENEVA
author_sort Peng Lin
title A new statistic to evaluate imputation reliability.
title_short A new statistic to evaluate imputation reliability.
title_full A new statistic to evaluate imputation reliability.
title_fullStr A new statistic to evaluate imputation reliability.
title_full_unstemmed A new statistic to evaluate imputation reliability.
title_sort new statistic to evaluate imputation reliability.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2010-03-01
description <h4>Background</h4>As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems.<h4>Methodology/principal findings</h4>We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (lambda = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms.<h4>Conclusions/significance</h4>IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20300623/?tool=EBI
work_keys_str_mv AT penglin anewstatistictoevaluateimputationreliability
AT sarahmhartz anewstatistictoevaluateimputationreliability
AT zhehaozhang anewstatistictoevaluateimputationreliability
AT scottfsaccone anewstatistictoevaluateimputationreliability
AT jiawang anewstatistictoevaluateimputationreliability
AT jayatischfield anewstatistictoevaluateimputationreliability
AT howardjedenberg anewstatistictoevaluateimputationreliability
AT johnrkramer anewstatistictoevaluateimputationreliability
AT alisonmgoate anewstatistictoevaluateimputationreliability
AT laurajbierut anewstatistictoevaluateimputationreliability
AT johnprice anewstatistictoevaluateimputationreliability
AT cogacollaboratorscogendcollaboratorsgeneva anewstatistictoevaluateimputationreliability
AT penglin newstatistictoevaluateimputationreliability
AT sarahmhartz newstatistictoevaluateimputationreliability
AT zhehaozhang newstatistictoevaluateimputationreliability
AT scottfsaccone newstatistictoevaluateimputationreliability
AT jiawang newstatistictoevaluateimputationreliability
AT jayatischfield newstatistictoevaluateimputationreliability
AT howardjedenberg newstatistictoevaluateimputationreliability
AT johnrkramer newstatistictoevaluateimputationreliability
AT alisonmgoate newstatistictoevaluateimputationreliability
AT laurajbierut newstatistictoevaluateimputationreliability
AT johnprice newstatistictoevaluateimputationreliability
AT cogacollaboratorscogendcollaboratorsgeneva newstatistictoevaluateimputationreliability
_version_ 1714808539321466880