integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth

Abstract Background The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by exam...

Full description

Bibliographic Details
Main Authors: Haley R. Eidem, Jacob L. Steenwyk, Jennifer H. Wisecaver, John A. Capra, Patrick Abbot, Antonis Rokas
Format: Article
Language:English
Published: BMC 2018-11-01
Series:BMC Medical Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12920-018-0426-y
id doaj-b8b96cd7d3bd48bb955a72713fd38a33
record_format Article
spelling doaj-b8b96cd7d3bd48bb955a72713fd38a332021-04-02T05:02:39ZengBMCBMC Medical Genomics1755-87942018-11-0111111310.1186/s12920-018-0426-yintegRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birthHaley R. Eidem0Jacob L. Steenwyk1Jennifer H. Wisecaver2John A. Capra3Patrick Abbot4Antonis Rokas5Department of Biological Sciences, Vanderbilt UniversityDepartment of Biological Sciences, Vanderbilt UniversityDepartment of Biological Sciences, Vanderbilt UniversityDepartment of Biological Sciences, Vanderbilt UniversityDepartment of Biological Sciences, Vanderbilt UniversityDepartment of Biological Sciences, Vanderbilt UniversityAbstract Background The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates. Methods To better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB). Results We developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE). Conclusions Desirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.http://link.springer.com/article/10.1186/s12920-018-0426-yPrematurityIntegrative genomicsComplex diseaseCandidate gene rankingVenn diagram
collection DOAJ
language English
format Article
sources DOAJ
author Haley R. Eidem
Jacob L. Steenwyk
Jennifer H. Wisecaver
John A. Capra
Patrick Abbot
Antonis Rokas
spellingShingle Haley R. Eidem
Jacob L. Steenwyk
Jennifer H. Wisecaver
John A. Capra
Patrick Abbot
Antonis Rokas
integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
BMC Medical Genomics
Prematurity
Integrative genomics
Complex disease
Candidate gene ranking
Venn diagram
author_facet Haley R. Eidem
Jacob L. Steenwyk
Jennifer H. Wisecaver
John A. Capra
Patrick Abbot
Antonis Rokas
author_sort Haley R. Eidem
title integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
title_short integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
title_full integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
title_fullStr integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
title_full_unstemmed integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
title_sort integrate: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth
publisher BMC
series BMC Medical Genomics
issn 1755-8794
publishDate 2018-11-01
description Abstract Background The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates. Methods To better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB). Results We developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE). Conclusions Desirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.
topic Prematurity
Integrative genomics
Complex disease
Candidate gene ranking
Venn diagram
url http://link.springer.com/article/10.1186/s12920-018-0426-y
work_keys_str_mv AT haleyreidem integrateadesirabilitybaseddataintegrationframeworkfortheprioritizationofcandidategenesacrossheterogeneousomicsanditsapplicationtopretermbirth
AT jacoblsteenwyk integrateadesirabilitybaseddataintegrationframeworkfortheprioritizationofcandidategenesacrossheterogeneousomicsanditsapplicationtopretermbirth
AT jenniferhwisecaver integrateadesirabilitybaseddataintegrationframeworkfortheprioritizationofcandidategenesacrossheterogeneousomicsanditsapplicationtopretermbirth
AT johnacapra integrateadesirabilitybaseddataintegrationframeworkfortheprioritizationofcandidategenesacrossheterogeneousomicsanditsapplicationtopretermbirth
AT patrickabbot integrateadesirabilitybaseddataintegrationframeworkfortheprioritizationofcandidategenesacrossheterogeneousomicsanditsapplicationtopretermbirth
AT antonisrokas integrateadesirabilitybaseddataintegrationframeworkfortheprioritizationofcandidategenesacrossheterogeneousomicsanditsapplicationtopretermbirth
_version_ 1724172765436575744