Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.

Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent....

Full description

Bibliographic Details
Main Authors: Cameron Palmer, Itsik Pe'er
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-06-01
Series:PLoS Genetics
Online Access:http://europepmc.org/articles/PMC4910998?pdf=render
id doaj-cba25bf5bd244a8fb1cc5f1f036bdac4
record_format Article
spelling doaj-cba25bf5bd244a8fb1cc5f1f036bdac42020-11-24T22:20:16ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042016-06-01126e100609110.1371/journal.pgen.1006091Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.Cameron PalmerItsik Pe'erMissing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.http://europepmc.org/articles/PMC4910998?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Cameron Palmer
Itsik Pe'er
spellingShingle Cameron Palmer
Itsik Pe'er
Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
PLoS Genetics
author_facet Cameron Palmer
Itsik Pe'er
author_sort Cameron Palmer
title Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_short Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_full Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_fullStr Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_full_unstemmed Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_sort bias characterization in probabilistic genotype data and improved signal detection with multiple imputation.
publisher Public Library of Science (PLoS)
series PLoS Genetics
issn 1553-7390
1553-7404
publishDate 2016-06-01
description Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.
url http://europepmc.org/articles/PMC4910998?pdf=render
work_keys_str_mv AT cameronpalmer biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation
AT itsikpeer biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation
_version_ 1725776075879874560