Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.

Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent....

Full description

Bibliographic Details
Main Authors:	Cameron Palmer, Itsik Pe'er
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2016-06-01
Series:	PLoS Genetics
Online Access:	http://europepmc.org/articles/PMC4910998?pdf=render

id	doaj-cba25bf5bd244a8fb1cc5f1f036bdac4
record_format	Article
spelling	doaj-cba25bf5bd244a8fb1cc5f1f036bdac42020-11-24T22:20:16ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042016-06-01126e100609110.1371/journal.pgen.1006091Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.Cameron PalmerItsik Pe'erMissing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.http://europepmc.org/articles/PMC4910998?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Cameron Palmer Itsik Pe'er
spellingShingle	Cameron Palmer Itsik Pe'er Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. PLoS Genetics
author_facet	Cameron Palmer Itsik Pe'er
author_sort	Cameron Palmer
title	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_short	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_full	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_fullStr	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_full_unstemmed	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.
title_sort	bias characterization in probabilistic genotype data and improved signal detection with multiple imputation.
publisher	Public Library of Science (PLoS)
series	PLoS Genetics
issn	1553-7390 1553-7404
publishDate	2016-06-01
description	Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.
url	http://europepmc.org/articles/PMC4910998?pdf=render
work_keys_str_mv	AT cameronpalmer biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation AT itsikpeer biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation
_version_	1725776075879874560

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.

Similar Items