Inference of population structure using dense haplotype data.

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haploty...

Full description

Bibliographic Details
Main Authors:	Daniel John Lawson, Garrett Hellenthal, Simon Myers, Daniel Falush
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2012-01-01
Series:	PLoS Genetics
Online Access:	http://europepmc.org/articles/PMC3266881?pdf=render

id	doaj-3ef6632f9c674e9abecb89a660a554f4
record_format	Article
spelling	doaj-3ef6632f9c674e9abecb89a660a554f42020-11-25T00:53:56ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042012-01-0181e100245310.1371/journal.pgen.1002453Inference of population structure using dense haplotype data.Daniel John LawsonGarrett HellenthalSimon MyersDaniel FalushThe advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.http://europepmc.org/articles/PMC3266881?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Daniel John Lawson Garrett Hellenthal Simon Myers Daniel Falush
spellingShingle	Daniel John Lawson Garrett Hellenthal Simon Myers Daniel Falush Inference of population structure using dense haplotype data. PLoS Genetics
author_facet	Daniel John Lawson Garrett Hellenthal Simon Myers Daniel Falush
author_sort	Daniel John Lawson
title	Inference of population structure using dense haplotype data.
title_short	Inference of population structure using dense haplotype data.
title_full	Inference of population structure using dense haplotype data.
title_fullStr	Inference of population structure using dense haplotype data.
title_full_unstemmed	Inference of population structure using dense haplotype data.
title_sort	inference of population structure using dense haplotype data.
publisher	Public Library of Science (PLoS)
series	PLoS Genetics
issn	1553-7390 1553-7404
publishDate	2012-01-01
description	The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.
url	http://europepmc.org/articles/PMC3266881?pdf=render
work_keys_str_mv	AT danieljohnlawson inferenceofpopulationstructureusingdensehaplotypedata AT garretthellenthal inferenceofpopulationstructureusingdensehaplotypedata AT simonmyers inferenceofpopulationstructureusingdensehaplotypedata AT danielfalush inferenceofpopulationstructureusingdensehaplotypedata
_version_	1725235750789709824

Inference of population structure using dense haplotype data.

Similar Items