Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.

Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether O...

Full description

Bibliographic Details
Main Authors: Andrew D Rouillard, Mark R Hurle, Pankaj Agarwal
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-05-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC5983857?pdf=render
id doaj-a50b6998156f452b82f96d743f2c3951
record_format Article
spelling doaj-a50b6998156f452b82f96d743f2c39512020-11-25T02:12:16ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-05-01145e100614210.1371/journal.pcbi.1006142Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.Andrew D RouillardMark R HurlePankaj AgarwalTarget selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.http://europepmc.org/articles/PMC5983857?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Andrew D Rouillard
Mark R Hurle
Pankaj Agarwal
spellingShingle Andrew D Rouillard
Mark R Hurle
Pankaj Agarwal
Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
PLoS Computational Biology
author_facet Andrew D Rouillard
Mark R Hurle
Pankaj Agarwal
author_sort Andrew D Rouillard
title Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
title_short Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
title_full Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
title_fullStr Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
title_full_unstemmed Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
title_sort systematic interrogation of diverse omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-05-01
description Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.
url http://europepmc.org/articles/PMC5983857?pdf=render
work_keys_str_mv AT andrewdrouillard systematicinterrogationofdiverseomicdatarevealsinterpretablerobustandgeneralizabletranscriptomicfeaturesofclinicallysuccessfultherapeutictargets
AT markrhurle systematicinterrogationofdiverseomicdatarevealsinterpretablerobustandgeneralizabletranscriptomicfeaturesofclinicallysuccessfultherapeutictargets
AT pankajagarwal systematicinterrogationofdiverseomicdatarevealsinterpretablerobustandgeneralizabletranscriptomicfeaturesofclinicallysuccessfultherapeutictargets
_version_ 1724910398956634112