Predicting chemical ecotoxicity by learning latent space chemical representations

In silico prediction of chemical ecotoxicity (HC50) represents an important complement to improve in vivo and in vitro toxicological assessment of manufactured chemicals. Recent application of machine learning models to predict chemical HC50 yields variable prediction performance that depends on eff...

Full description

Bibliographic Details
Main Authors: Baccarelli, A.A (Author), Gao, F. (Author), Shen, Y. (Author), Zhang, W. (Author)
Format: Article
Language:English
Published: Elsevier Ltd 2022
Subjects:
Online Access:View Fulltext in Publisher
Description
Summary:In silico prediction of chemical ecotoxicity (HC50) represents an important complement to improve in vivo and in vitro toxicological assessment of manufactured chemicals. Recent application of machine learning models to predict chemical HC50 yields variable prediction performance that depends on effectively learning chemical representations from high-dimension data. To improve HC50 prediction performance, we developed an autoencoder model by learning latent space chemical embeddings. This novel approach achieved state-of-the-art prediction performance of HC50 with R2 of 0.668 ± 0.003 and mean absolute error (MAE) of 0.572 ± 0.001, and outperformed other dimension reduction methods including principal component analysis (PCA) (R2 = 0.601 ± 0.031 and MAE = 0.629 ± 0.005), kernel PCA (R2 = 0.631 ± 0.008 and MAE = 0.625 ± 0.006), and uniform manifold approximation and projection dimensionality reduction (R2 = 0.400 ± 0.008 and MAE = 0.801 ± 0.002). A simple linear layer with chemical embeddings learned from the autoencoder model performed better than random forest (R2 = 0.663 ± 0.007 and MAE = 0.591 ± 0.008), fully connected neural network (R2 = 0.614 ± 0.016 and MAE = 0.610 ± 0.008), least absolute shrinkage and selection operator (R2 = 0.617 ± 0.037 and MAE = 0.619 ± 0.007), and ridge regression (R2 = 0.638 ± 0.007 and MAE = 0.613 ± 0.005) using unlearned raw input features. Our results highlighted the usefulness of learning latent chemical representations, and our autoencoder model provides an alternative approach for robust HC50 prediction. © 2022 The Author(s)
ISBN:01604120 (ISSN)
DOI:10.1016/j.envint.2022.107224