Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction

Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help...

Full description

Bibliographic Details
Main Author: Shizhong Xu
Format: Article
Language:English
Published: Oxford University Press 2017-03-01
Series:G3: Genes, Genomes, Genetics
Subjects:
Online Access:http://g3journal.org/lookup/doi/10.1534/g3.116.038059
id doaj-e88745eeda484a86812b819eb7b4029e
record_format Article
spelling doaj-e88745eeda484a86812b819eb7b4029e2021-07-02T02:06:53ZengOxford University PressG3: Genes, Genomes, Genetics2160-18362017-03-017389590910.1534/g3.116.03805914Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic PredictionShizhong XuGenomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K – 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.http://g3journal.org/lookup/doi/10.1534/g3.116.038059best linear unbiased predictioncross-validationgeneralized cross-validationgenomic selectionhybrid breedingmixed modelGen PredShared data resource
collection DOAJ
language English
format Article
sources DOAJ
author Shizhong Xu
spellingShingle Shizhong Xu
Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
G3: Genes, Genomes, Genetics
best linear unbiased prediction
cross-validation
generalized cross-validation
genomic selection
hybrid breeding
mixed model
Gen Pred
Shared data resource
author_facet Shizhong Xu
author_sort Shizhong Xu
title Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_short Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_full Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_fullStr Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_full_unstemmed Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_sort predicted residual error sum of squares of mixed models: an application for genomic prediction
publisher Oxford University Press
series G3: Genes, Genomes, Genetics
issn 2160-1836
publishDate 2017-03-01
description Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K – 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.
topic best linear unbiased prediction
cross-validation
generalized cross-validation
genomic selection
hybrid breeding
mixed model
Gen Pred
Shared data resource
url http://g3journal.org/lookup/doi/10.1534/g3.116.038059
work_keys_str_mv AT shizhongxu predictedresidualerrorsumofsquaresofmixedmodelsanapplicationforgenomicprediction
_version_ 1721343889124622336