Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
BackgroundImputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic popu...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2019-04-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2019.00239/full |
id |
doaj-f44de7082229410e9de2141980b23398 |
---|---|
record_format |
Article |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sanjeev Sariya Sanjeev Sariya Joseph H. Lee Joseph H. Lee Joseph H. Lee Richard Mayeux Richard Mayeux Richard Mayeux Badri N. Vardarajan Badri N. Vardarajan Dolly Reyes-Dumeyer Dolly Reyes-Dumeyer Jennifer J. Manly Jennifer J. Manly Jennifer J. Manly Adam M. Brickman Adam M. Brickman Adam M. Brickman Rafael Lantigua Martin Medrano Ivonne Z. Jimenez-Velazquez Giuseppe Tosto Giuseppe Tosto Giuseppe Tosto |
spellingShingle |
Sanjeev Sariya Sanjeev Sariya Joseph H. Lee Joseph H. Lee Joseph H. Lee Richard Mayeux Richard Mayeux Richard Mayeux Badri N. Vardarajan Badri N. Vardarajan Dolly Reyes-Dumeyer Dolly Reyes-Dumeyer Jennifer J. Manly Jennifer J. Manly Jennifer J. Manly Adam M. Brickman Adam M. Brickman Adam M. Brickman Rafael Lantigua Martin Medrano Ivonne Z. Jimenez-Velazquez Giuseppe Tosto Giuseppe Tosto Giuseppe Tosto Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools Frontiers in Genetics rare variants imputation admixed population GWAS 1000G |
author_facet |
Sanjeev Sariya Sanjeev Sariya Joseph H. Lee Joseph H. Lee Joseph H. Lee Richard Mayeux Richard Mayeux Richard Mayeux Badri N. Vardarajan Badri N. Vardarajan Dolly Reyes-Dumeyer Dolly Reyes-Dumeyer Jennifer J. Manly Jennifer J. Manly Jennifer J. Manly Adam M. Brickman Adam M. Brickman Adam M. Brickman Rafael Lantigua Martin Medrano Ivonne Z. Jimenez-Velazquez Giuseppe Tosto Giuseppe Tosto Giuseppe Tosto |
author_sort |
Sanjeev Sariya |
title |
Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools |
title_short |
Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools |
title_full |
Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools |
title_fullStr |
Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools |
title_full_unstemmed |
Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools |
title_sort |
rare variants imputation in admixed populations: comparison across reference panels and bioinformatics tools |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Genetics |
issn |
1664-8021 |
publishDate |
2019-04-01 |
description |
BackgroundImputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic population (CH).MethodsWe evaluated imputation accuracy in CH (N = 1,000), focusing on rare (0.1% ≤ minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N = 27,165) and 1000 Genome Project (1000G phase 3; N = 2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: (a) high-quality variant counts according to imputation tools’ internal indexes (e.g., IMPUTE2 “Info” ≥ 80%). (b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; (c) Cohen’s kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; (d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion.ResultsSHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs. 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs. 0.75% for high-quality rare and ultra-rare variants, respectively, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g., rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05).ConclusionReference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy. |
topic |
rare variants imputation admixed population GWAS 1000G |
url |
https://www.frontiersin.org/article/10.3389/fgene.2019.00239/full |
work_keys_str_mv |
AT sanjeevsariya rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT sanjeevsariya rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT josephhlee rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT josephhlee rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT josephhlee rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT richardmayeux rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT richardmayeux rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT richardmayeux rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT badrinvardarajan rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT badrinvardarajan rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT dollyreyesdumeyer rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT dollyreyesdumeyer rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT jenniferjmanly rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT jenniferjmanly rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT jenniferjmanly rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT adammbrickman rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT adammbrickman rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT adammbrickman rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT rafaellantigua rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT martinmedrano rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT ivonnezjimenezvelazquez rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT giuseppetosto rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT giuseppetosto rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools AT giuseppetosto rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools |
_version_ |
1725313677055229952 |
spelling |
doaj-f44de7082229410e9de2141980b233982020-11-25T00:34:23ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-04-011010.3389/fgene.2019.00239435399Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics ToolsSanjeev Sariya0Sanjeev Sariya1Joseph H. Lee2Joseph H. Lee3Joseph H. Lee4Richard Mayeux5Richard Mayeux6Richard Mayeux7Badri N. Vardarajan8Badri N. Vardarajan9Dolly Reyes-Dumeyer10Dolly Reyes-Dumeyer11Jennifer J. Manly12Jennifer J. Manly13Jennifer J. Manly14Adam M. Brickman15Adam M. Brickman16Adam M. Brickman17Rafael Lantigua18Martin Medrano19Ivonne Z. Jimenez-Velazquez20Giuseppe Tosto21Giuseppe Tosto22Giuseppe Tosto23Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesDepartment of Neurology, College of Physicians and Surgeons, New York-Presbyterian Hospital, Columbia University Medical Center, New York, NY, United StatesTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesDepartment of Neurology, College of Physicians and Surgeons, New York-Presbyterian Hospital, Columbia University Medical Center, New York, NY, United StatesTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesDepartment of Neurology, College of Physicians and Surgeons, New York-Presbyterian Hospital, Columbia University Medical Center, New York, NY, United StatesTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesDepartment of Neurology, College of Physicians and Surgeons, New York-Presbyterian Hospital, Columbia University Medical Center, New York, NY, United StatesMedicine College of Physicians and Surgeons, and The Department of Epidemiology, School of Public Health, Columbia University, New York, NY, United StatesSchool of Medicine, Pontificia Universidad Catolica Madre y Maestra, Santiago, Dominican RepublicDepartment of Medicine, Geriatrics Program, University of Puerto Rico School of Medicine, San Juan, Puerto RicoTaub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United StatesThe Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University, New York, NY, United StatesDepartment of Neurology, College of Physicians and Surgeons, New York-Presbyterian Hospital, Columbia University Medical Center, New York, NY, United StatesBackgroundImputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic population (CH).MethodsWe evaluated imputation accuracy in CH (N = 1,000), focusing on rare (0.1% ≤ minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N = 27,165) and 1000 Genome Project (1000G phase 3; N = 2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: (a) high-quality variant counts according to imputation tools’ internal indexes (e.g., IMPUTE2 “Info” ≥ 80%). (b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; (c) Cohen’s kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; (d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion.ResultsSHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs. 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs. 0.75% for high-quality rare and ultra-rare variants, respectively, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g., rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05).ConclusionReference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy.https://www.frontiersin.org/article/10.3389/fgene.2019.00239/fullrare variantsimputationadmixed populationGWAS1000G |