A Ranking Approach to Genomic Selection.

Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression proble...

Full description

Bibliographic Details
Main Authors:	Mathieu Blondel, Akio Onogi, Hiroyoshi Iwata, Naonori Ueda
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2015-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4466774?pdf=render

id	doaj-e27679ac555548ed871592dff7143dfd
record_format	Article
spelling	doaj-e27679ac555548ed871592dff7143dfd2020-11-25T02:42:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01106e012857010.1371/journal.pone.0128570A Ranking Approach to Genomic Selection.Mathieu BlondelAkio OnogiHiroyoshi IwataNaonori UedaGenomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used.In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value.We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.http://europepmc.org/articles/PMC4466774?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Mathieu Blondel Akio Onogi Hiroyoshi Iwata Naonori Ueda
spellingShingle	Mathieu Blondel Akio Onogi Hiroyoshi Iwata Naonori Ueda A Ranking Approach to Genomic Selection. PLoS ONE
author_facet	Mathieu Blondel Akio Onogi Hiroyoshi Iwata Naonori Ueda
author_sort	Mathieu Blondel
title	A Ranking Approach to Genomic Selection.
title_short	A Ranking Approach to Genomic Selection.
title_full	A Ranking Approach to Genomic Selection.
title_fullStr	A Ranking Approach to Genomic Selection.
title_full_unstemmed	A Ranking Approach to Genomic Selection.
title_sort	ranking approach to genomic selection.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2015-01-01
description	Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used.In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value.We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
url	http://europepmc.org/articles/PMC4466774?pdf=render
work_keys_str_mv	AT mathieublondel arankingapproachtogenomicselection AT akioonogi arankingapproachtogenomicselection AT hiroyoshiiwata arankingapproachtogenomicselection AT naonoriueda arankingapproachtogenomicselection AT mathieublondel rankingapproachtogenomicselection AT akioonogi rankingapproachtogenomicselection AT hiroyoshiiwata rankingapproachtogenomicselection AT naonoriueda rankingapproachtogenomicselection
_version_	1724772493076463616

A Ranking Approach to Genomic Selection.

Similar Items