Using Local Convolutional Neural Networks for Genomic Prediction

The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter cor...

Full description

Bibliographic Details
Main Authors: Torsten Pook, Jan Freudenthal, Arthur Korte, Henner Simianer
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-11-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2020.561497/full
id doaj-37a332f59ca549ddaba9e124e8057dd5
record_format Article
spelling doaj-37a332f59ca549ddaba9e124e8057dd52020-11-25T04:00:16ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-11-011110.3389/fgene.2020.561497561497Using Local Convolutional Neural Networks for Genomic PredictionTorsten Pook0Jan Freudenthal1Arthur Korte2Henner Simianer3Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, GermanyCenter for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, GermanyCenter for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, GermanyAnimal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, GermanyThe prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel (n = 10,000; p = 34,595) and real Arabidopsis data (n = 2,039; p = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.https://www.frontiersin.org/articles/10.3389/fgene.2020.561497/fullphenotype predictionKerasgenomic selectionselectionbreedingmachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Torsten Pook
Jan Freudenthal
Arthur Korte
Henner Simianer
spellingShingle Torsten Pook
Jan Freudenthal
Arthur Korte
Henner Simianer
Using Local Convolutional Neural Networks for Genomic Prediction
Frontiers in Genetics
phenotype prediction
Keras
genomic selection
selection
breeding
machine learning
author_facet Torsten Pook
Jan Freudenthal
Arthur Korte
Henner Simianer
author_sort Torsten Pook
title Using Local Convolutional Neural Networks for Genomic Prediction
title_short Using Local Convolutional Neural Networks for Genomic Prediction
title_full Using Local Convolutional Neural Networks for Genomic Prediction
title_fullStr Using Local Convolutional Neural Networks for Genomic Prediction
title_full_unstemmed Using Local Convolutional Neural Networks for Genomic Prediction
title_sort using local convolutional neural networks for genomic prediction
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2020-11-01
description The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel (n = 10,000; p = 34,595) and real Arabidopsis data (n = 2,039; p = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.
topic phenotype prediction
Keras
genomic selection
selection
breeding
machine learning
url https://www.frontiersin.org/articles/10.3389/fgene.2020.561497/full
work_keys_str_mv AT torstenpook usinglocalconvolutionalneuralnetworksforgenomicprediction
AT janfreudenthal usinglocalconvolutionalneuralnetworksforgenomicprediction
AT arthurkorte usinglocalconvolutionalneuralnetworksforgenomicprediction
AT hennersimianer usinglocalconvolutionalneuralnetworksforgenomicprediction
_version_ 1724451577392005120