PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.

With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood samp...

Full description

Bibliographic Details
Main Authors: Quan Long, Daniel C Jeffares, Qingrun Zhang, Kai Ye, Viktoria Nizhynska, Zemin Ning, Chris Tyler-Smith, Magnus Nordborg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3016441?pdf=render
id doaj-5355845643504218bd465bc39932f029
record_format Article
spelling doaj-5355845643504218bd465bc39932f0292020-11-25T00:04:43ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0161e1529210.1371/journal.pone.0015292PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.Quan LongDaniel C JeffaresQingrun ZhangKai YeViktoria NizhynskaZemin NingChris Tyler-SmithMagnus NordborgWith the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/.http://europepmc.org/articles/PMC3016441?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Quan Long
Daniel C Jeffares
Qingrun Zhang
Kai Ye
Viktoria Nizhynska
Zemin Ning
Chris Tyler-Smith
Magnus Nordborg
spellingShingle Quan Long
Daniel C Jeffares
Qingrun Zhang
Kai Ye
Viktoria Nizhynska
Zemin Ning
Chris Tyler-Smith
Magnus Nordborg
PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
PLoS ONE
author_facet Quan Long
Daniel C Jeffares
Qingrun Zhang
Kai Ye
Viktoria Nizhynska
Zemin Ning
Chris Tyler-Smith
Magnus Nordborg
author_sort Quan Long
title PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
title_short PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
title_full PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
title_fullStr PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
title_full_unstemmed PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
title_sort poolhap: inferring haplotype frequencies from pooled samples by next generation sequencing.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2011-01-01
description With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/.
url http://europepmc.org/articles/PMC3016441?pdf=render
work_keys_str_mv AT quanlong poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT danielcjeffares poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT qingrunzhang poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT kaiye poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT viktorianizhynska poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT zeminning poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT christylersmith poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
AT magnusnordborg poolhapinferringhaplotypefrequenciesfrompooledsamplesbynextgenerationsequencing
_version_ 1725428326958366720