ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset

Abstract Background Genomic prediction is an advanced method for estimating genetic values, which has been widely accepted for genetic evaluation in animal and disease-risk prediction in human. It estimates genetic values with genome-wide distributed SNPs instead of pedigree. The key step of it is t...

Full description

Bibliographic Details
Main Authors: Dan Jiang, Cong Xin, Jinhua Ye, Yingbo Yuan, Ming Fang
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3319-y
id doaj-178a6dd310514fa6a57bdfaadf1b3071
record_format Article
spelling doaj-178a6dd310514fa6a57bdfaadf1b30712020-12-27T12:21:50ZengBMCBMC Bioinformatics1471-21052019-12-012011510.1186/s12859-019-3319-yICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big datasetDan Jiang0Cong Xin1Jinhua Ye2Yingbo Yuan3Ming Fang4Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Fisheries College, Jimei UniversityInstitute of Dermatology and Department of Dermatology, the First Affiliated Hospital of Anhui Medical UniversityCollege of Science, Heilongjiang Bayi Agricultural UniversityKey Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Fisheries College, Jimei UniversityKey Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Fisheries College, Jimei UniversityAbstract Background Genomic prediction is an advanced method for estimating genetic values, which has been widely accepted for genetic evaluation in animal and disease-risk prediction in human. It estimates genetic values with genome-wide distributed SNPs instead of pedigree. The key step of it is to construct genomic relationship matrix (GRM) via genome-wide SNPs; however, usually the calculation of GRM needs huge computer memory especially when the SNP number and sample size are big, so that sometimes it will become computationally prohibitive even for super computer clusters. We herein developed an integrative algorithm to compute GRM. To avoid calculating GRM for the whole genome, ICGRM freely divides the genome-wide SNPs into several segments and computes the summary statistics related to GRM for each segment that requires quite few computer RAM; then it integrates these summary statistics to produce GRM for whole genome. Results It showed that the computer memory of ICGRM was reduced by 15 times (from 218Gb to 14Gb) after the genome SNPs were split into 5 to 200 parts in terms of the number of SNPs in our simulation dataset, making it computationally feasible for almost all kinds of computer servers. ICGRM is implemented in C/C++ and freely available via https://github.com/mingfang618/CLGRM. Conclusions ICGRM is computationally efficient software to build GRM and can be used for big dataset.https://doi.org/10.1186/s12859-019-3319-yGenomic relationship matrixGenomic selectionGblup
collection DOAJ
language English
format Article
sources DOAJ
author Dan Jiang
Cong Xin
Jinhua Ye
Yingbo Yuan
Ming Fang
spellingShingle Dan Jiang
Cong Xin
Jinhua Ye
Yingbo Yuan
Ming Fang
ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
BMC Bioinformatics
Genomic relationship matrix
Genomic selection
Gblup
author_facet Dan Jiang
Cong Xin
Jinhua Ye
Yingbo Yuan
Ming Fang
author_sort Dan Jiang
title ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
title_short ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
title_full ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
title_fullStr ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
title_full_unstemmed ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
title_sort icgrm: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-12-01
description Abstract Background Genomic prediction is an advanced method for estimating genetic values, which has been widely accepted for genetic evaluation in animal and disease-risk prediction in human. It estimates genetic values with genome-wide distributed SNPs instead of pedigree. The key step of it is to construct genomic relationship matrix (GRM) via genome-wide SNPs; however, usually the calculation of GRM needs huge computer memory especially when the SNP number and sample size are big, so that sometimes it will become computationally prohibitive even for super computer clusters. We herein developed an integrative algorithm to compute GRM. To avoid calculating GRM for the whole genome, ICGRM freely divides the genome-wide SNPs into several segments and computes the summary statistics related to GRM for each segment that requires quite few computer RAM; then it integrates these summary statistics to produce GRM for whole genome. Results It showed that the computer memory of ICGRM was reduced by 15 times (from 218Gb to 14Gb) after the genome SNPs were split into 5 to 200 parts in terms of the number of SNPs in our simulation dataset, making it computationally feasible for almost all kinds of computer servers. ICGRM is implemented in C/C++ and freely available via https://github.com/mingfang618/CLGRM. Conclusions ICGRM is computationally efficient software to build GRM and can be used for big dataset.
topic Genomic relationship matrix
Genomic selection
Gblup
url https://doi.org/10.1186/s12859-019-3319-y
work_keys_str_mv AT danjiang icgrmintegrativeconstructionofgenomicrelationshipmatrixcombiningmultiplegenomicregionsforbigdataset
AT congxin icgrmintegrativeconstructionofgenomicrelationshipmatrixcombiningmultiplegenomicregionsforbigdataset
AT jinhuaye icgrmintegrativeconstructionofgenomicrelationshipmatrixcombiningmultiplegenomicregionsforbigdataset
AT yingboyuan icgrmintegrativeconstructionofgenomicrelationshipmatrixcombiningmultiplegenomicregionsforbigdataset
AT mingfang icgrmintegrativeconstructionofgenomicrelationshipmatrixcombiningmultiplegenomicregionsforbigdataset
_version_ 1724369012859600896