Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
Abstract Background The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-05-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12864-017-3767-6 |
id |
doaj-082816d78c974cf38b2f17d92c98c203 |
---|---|
record_format |
Article |
spelling |
doaj-082816d78c974cf38b2f17d92c98c2032020-11-24T22:17:11ZengBMCBMC Genomics1471-21642017-05-0118111210.1186/s12864-017-3767-6Extremely low-coverage whole genome sequencing in South Asians captures population genomics informationNavin Rustagi0Anbo Zhou1W. Scott Watkins2Erika Gedvilaite3Shuoguo Wang4Naveen Ramesh5Donna Muzny6Richard A. Gibbs7Lynn B. Jorde8Fuli Yu9Jinchuan Xing10Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyDepartment of Human Genetics, Eccles Institute of Human Genetics, University of UtahDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Human Genetics, Eccles Institute of Human Genetics, University of UtahDepartment of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of MedicineDepartment of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New JerseyAbstract Background The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. Results South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. Conclusions Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.http://link.springer.com/article/10.1186/s12864-017-3767-6Single nucleotide variantWhole genome sequencingSouth AsianExtremely low coveragePopulation structureImputation |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Navin Rustagi Anbo Zhou W. Scott Watkins Erika Gedvilaite Shuoguo Wang Naveen Ramesh Donna Muzny Richard A. Gibbs Lynn B. Jorde Fuli Yu Jinchuan Xing |
spellingShingle |
Navin Rustagi Anbo Zhou W. Scott Watkins Erika Gedvilaite Shuoguo Wang Naveen Ramesh Donna Muzny Richard A. Gibbs Lynn B. Jorde Fuli Yu Jinchuan Xing Extremely low-coverage whole genome sequencing in South Asians captures population genomics information BMC Genomics Single nucleotide variant Whole genome sequencing South Asian Extremely low coverage Population structure Imputation |
author_facet |
Navin Rustagi Anbo Zhou W. Scott Watkins Erika Gedvilaite Shuoguo Wang Naveen Ramesh Donna Muzny Richard A. Gibbs Lynn B. Jorde Fuli Yu Jinchuan Xing |
author_sort |
Navin Rustagi |
title |
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information |
title_short |
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information |
title_full |
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information |
title_fullStr |
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information |
title_full_unstemmed |
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information |
title_sort |
extremely low-coverage whole genome sequencing in south asians captures population genomics information |
publisher |
BMC |
series |
BMC Genomics |
issn |
1471-2164 |
publishDate |
2017-05-01 |
description |
Abstract Background The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. Results South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. Conclusions Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies. |
topic |
Single nucleotide variant Whole genome sequencing South Asian Extremely low coverage Population structure Imputation |
url |
http://link.springer.com/article/10.1186/s12864-017-3767-6 |
work_keys_str_mv |
AT navinrustagi extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT anbozhou extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT wscottwatkins extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT erikagedvilaite extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT shuoguowang extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT naveenramesh extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT donnamuzny extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT richardagibbs extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT lynnbjorde extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT fuliyu extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation AT jinchuanxing extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation |
_version_ |
1725786269371334656 |