A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature

The exponentially increasing size of biomedical literature and the limited ability of manual curators to discover protein–protein interactions (PPIs) in text has led to delays in keeping PPI databases updated with the current findings. The state-of-the-art text mining methods for PPI extraction are...

Full description

Bibliographic Details
Main Authors: Changqin Quan, Zhiwei Luo, Song Wang
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/8/2690
id doaj-c6eeff98461b4bed860c7e191737d7c0
record_format Article
spelling doaj-c6eeff98461b4bed860c7e191737d7c02020-11-25T02:33:48ZengMDPI AGApplied Sciences2076-34172020-04-01102690269010.3390/app10082690A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical LiteratureChangqin Quan0Zhiwei Luo1Song Wang2Graduate School of System Informatics, Kobe University, 1-1, Rokkodai-cho, Nada-ku, Kobe 657-8501, JapanGraduate School of System Informatics, Kobe University, 1-1, Rokkodai-cho, Nada-ku, Kobe 657-8501, JapanSchool of Elec Eng, Comp and Math Sci; Curtin University, Kent St, Bentley WA 6102, AustraliaThe exponentially increasing size of biomedical literature and the limited ability of manual curators to discover protein–protein interactions (PPIs) in text has led to delays in keeping PPI databases updated with the current findings. The state-of-the-art text mining methods for PPI extraction are primarily based on deep learning (DL) models, and the performance of a DL-based method is mainly affected by the architecture of DL models and the feature embedding methods. In this study, we compared different architectures of DL models, including convolutional neural networks (CNN), long short-term memory (LSTM), and hybrid models, and proposed a hybrid architecture of a bidirectional LSTM+CNN model for PPI extraction. Pretrained word embedding and shortest dependency path (SDP) embedding are fed into a two-embedding channel model, such that the model is able to model long-distance contextual information and can capture the local features and structure information effectively. The experimental results showed that the proposed model is superior to the non-hybrid DL models, and the hybrid CNN+Bidirectional LSTM model works well for PPI extraction. The visualization and comparison of the hidden features learned by different DL models further confirmed the effectiveness of the proposed model.https://www.mdpi.com/2076-3417/10/8/2690protein–protein interactionsdeep learning (DL)convolutional neural networks (CNN)bidirectional long short-term memory (bidirectional LSTM)
collection DOAJ
language English
format Article
sources DOAJ
author Changqin Quan
Zhiwei Luo
Song Wang
spellingShingle Changqin Quan
Zhiwei Luo
Song Wang
A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
Applied Sciences
protein–protein interactions
deep learning (DL)
convolutional neural networks (CNN)
bidirectional long short-term memory (bidirectional LSTM)
author_facet Changqin Quan
Zhiwei Luo
Song Wang
author_sort Changqin Quan
title A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
title_short A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
title_full A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
title_fullStr A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
title_full_unstemmed A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
title_sort hybrid deep learning model for protein–protein interactions extraction from biomedical literature
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-04-01
description The exponentially increasing size of biomedical literature and the limited ability of manual curators to discover protein–protein interactions (PPIs) in text has led to delays in keeping PPI databases updated with the current findings. The state-of-the-art text mining methods for PPI extraction are primarily based on deep learning (DL) models, and the performance of a DL-based method is mainly affected by the architecture of DL models and the feature embedding methods. In this study, we compared different architectures of DL models, including convolutional neural networks (CNN), long short-term memory (LSTM), and hybrid models, and proposed a hybrid architecture of a bidirectional LSTM+CNN model for PPI extraction. Pretrained word embedding and shortest dependency path (SDP) embedding are fed into a two-embedding channel model, such that the model is able to model long-distance contextual information and can capture the local features and structure information effectively. The experimental results showed that the proposed model is superior to the non-hybrid DL models, and the hybrid CNN+Bidirectional LSTM model works well for PPI extraction. The visualization and comparison of the hidden features learned by different DL models further confirmed the effectiveness of the proposed model.
topic protein–protein interactions
deep learning (DL)
convolutional neural networks (CNN)
bidirectional long short-term memory (bidirectional LSTM)
url https://www.mdpi.com/2076-3417/10/8/2690
work_keys_str_mv AT changqinquan ahybriddeeplearningmodelforproteinproteininteractionsextractionfrombiomedicalliterature
AT zhiweiluo ahybriddeeplearningmodelforproteinproteininteractionsextractionfrombiomedicalliterature
AT songwang ahybriddeeplearningmodelforproteinproteininteractionsextractionfrombiomedicalliterature
AT changqinquan hybriddeeplearningmodelforproteinproteininteractionsextractionfrombiomedicalliterature
AT zhiweiluo hybriddeeplearningmodelforproteinproteininteractionsextractionfrombiomedicalliterature
AT songwang hybriddeeplearningmodelforproteinproteininteractionsextractionfrombiomedicalliterature
_version_ 1724812426086449152