Extremely low-coverage whole genome sequencing in South Asians captures population genomics information

Abstract Background The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at...

Full description

Bibliographic Details
Main Authors: Navin Rustagi, Anbo Zhou, W. Scott Watkins, Erika Gedvilaite, Shuoguo Wang, Naveen Ramesh, Donna Muzny, Richard A. Gibbs, Lynn B. Jorde, Fuli Yu, Jinchuan Xing
Format: Article
Language:English
Published: BMC 2017-05-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-3767-6
id doaj-082816d78c974cf38b2f17d92c98c203
record_format Article
spelling doaj-082816d78c974cf38b2f17d92c98c2032020-11-24T22:17:11ZengBMCBMC Genomics1471-21642017-05-0118111210.1186/s12864-017-3767-6Extremely low-coverage whole genome sequencing in South Asians captures population genomics informationNavin Rustagi0Anbo Zhou1W. Scott Watkins2Erika Gedvilaite3Shuoguo Wang4Naveen Ramesh5Donna Muzny6Richard A. Gibbs7Lynn B. Jorde8Fuli Yu9Jinchuan Xing10Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyDepartment of Human Genetics, Eccles Institute of Human Genetics, University of UtahDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Human Genetics, Eccles Institute of Human Genetics, University of UtahDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyAbstract Background The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. Results South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. Conclusions Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.http://link.springer.com/article/10.1186/s12864-017-3767-6Single nucleotide variantWhole genome sequencingSouth AsianExtremely low coveragePopulation structureImputation
collection DOAJ
language English
format Article
sources DOAJ
author Navin Rustagi
Anbo Zhou
W. Scott Watkins
Erika Gedvilaite
Shuoguo Wang
Naveen Ramesh
Donna Muzny
Richard A. Gibbs
Lynn B. Jorde
Fuli Yu
Jinchuan Xing
spellingShingle Navin Rustagi
Anbo Zhou
W. Scott Watkins
Erika Gedvilaite
Shuoguo Wang
Naveen Ramesh
Donna Muzny
Richard A. Gibbs
Lynn B. Jorde
Fuli Yu
Jinchuan Xing
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
BMC Genomics
Single nucleotide variant
Whole genome sequencing
South Asian
Extremely low coverage
Population structure
Imputation
author_facet Navin Rustagi
Anbo Zhou
W. Scott Watkins
Erika Gedvilaite
Shuoguo Wang
Naveen Ramesh
Donna Muzny
Richard A. Gibbs
Lynn B. Jorde
Fuli Yu
Jinchuan Xing
author_sort Navin Rustagi
title Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_short Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_full Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_fullStr Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_full_unstemmed Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_sort extremely low-coverage whole genome sequencing in south asians captures population genomics information
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2017-05-01
description Abstract Background The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. Results South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. Conclusions Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.
topic Single nucleotide variant
Whole genome sequencing
South Asian
Extremely low coverage
Population structure
Imputation
url http://link.springer.com/article/10.1186/s12864-017-3767-6
work_keys_str_mv AT navinrustagi extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT anbozhou extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT wscottwatkins extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT erikagedvilaite extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT shuoguowang extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT naveenramesh extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT donnamuzny extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT richardagibbs extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT lynnbjorde extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT fuliyu extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT jinchuanxing extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
_version_ 1725786269371334656