A survey of methods and tools to detect recent and strong positive selection

Abstract Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection i...

Full description

Bibliographic Details
Main Authors:	Pavlos Pavlidis, Nikolaos Alachiotis
Format:	Article
Language:	English
Published:	BMC 2017-04-01
Series:	Journal of Biological Research - Thessaloniki
Subjects:	Positive selection Selective sweep
Online Access:	http://link.springer.com/article/10.1186/s40709-017-0064-0

id	doaj-0094903ab0b34b5e92d27f0dee5bebd1
record_format	Article
spelling	doaj-0094903ab0b34b5e92d27f0dee5bebd12020-11-25T00:27:52ZengBMCJournal of Biological Research - Thessaloniki2241-57932017-04-0124111710.1186/s40709-017-0064-0A survey of methods and tools to detect recent and strong positive selectionPavlos Pavlidis0Nikolaos Alachiotis1Institute of Computer Science, Foundation for Research and Technology-HellasInstitute of Computer Science, Foundation for Research and Technology-HellasAbstract Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima’s D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.http://link.springer.com/article/10.1186/s40709-017-0064-0Positive selectionSelective sweep
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Pavlos Pavlidis Nikolaos Alachiotis
spellingShingle	Pavlos Pavlidis Nikolaos Alachiotis A survey of methods and tools to detect recent and strong positive selection Journal of Biological Research - Thessaloniki Positive selection Selective sweep
author_facet	Pavlos Pavlidis Nikolaos Alachiotis
author_sort	Pavlos Pavlidis
title	A survey of methods and tools to detect recent and strong positive selection
title_short	A survey of methods and tools to detect recent and strong positive selection
title_full	A survey of methods and tools to detect recent and strong positive selection
title_fullStr	A survey of methods and tools to detect recent and strong positive selection
title_full_unstemmed	A survey of methods and tools to detect recent and strong positive selection
title_sort	survey of methods and tools to detect recent and strong positive selection
publisher	BMC
series	Journal of Biological Research - Thessaloniki
issn	2241-5793
publishDate	2017-04-01
description	Abstract Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima’s D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.
topic	Positive selection Selective sweep
url	http://link.springer.com/article/10.1186/s40709-017-0064-0
work_keys_str_mv	AT pavlospavlidis asurveyofmethodsandtoolstodetectrecentandstrongpositiveselection AT nikolaosalachiotis asurveyofmethodsandtoolstodetectrecentandstrongpositiveselection AT pavlospavlidis surveyofmethodsandtoolstodetectrecentandstrongpositiveselection AT nikolaosalachiotis surveyofmethodsandtoolstodetectrecentandstrongpositiveselection
_version_	1725338062100103168

A survey of methods and tools to detect recent and strong positive selection

Similar Items