Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent....
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2016-06-01
|
Series: | PLoS Genetics |
Online Access: | http://europepmc.org/articles/PMC4910998?pdf=render |
id |
doaj-cba25bf5bd244a8fb1cc5f1f036bdac4 |
---|---|
record_format |
Article |
spelling |
doaj-cba25bf5bd244a8fb1cc5f1f036bdac42020-11-24T22:20:16ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042016-06-01126e100609110.1371/journal.pgen.1006091Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.Cameron PalmerItsik Pe'erMissing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.http://europepmc.org/articles/PMC4910998?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Cameron Palmer Itsik Pe'er |
spellingShingle |
Cameron Palmer Itsik Pe'er Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. PLoS Genetics |
author_facet |
Cameron Palmer Itsik Pe'er |
author_sort |
Cameron Palmer |
title |
Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. |
title_short |
Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. |
title_full |
Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. |
title_fullStr |
Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. |
title_full_unstemmed |
Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. |
title_sort |
bias characterization in probabilistic genotype data and improved signal detection with multiple imputation. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Genetics |
issn |
1553-7390 1553-7404 |
publishDate |
2016-06-01 |
description |
Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. |
url |
http://europepmc.org/articles/PMC4910998?pdf=render |
work_keys_str_mv |
AT cameronpalmer biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation AT itsikpeer biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation |
_version_ |
1725776075879874560 |