Mining microbe–disease interactions from literature via a transfer learning model

Abstract Background Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe–disease interactions are hidden in the biomedical literature. The structured databases for microbe–disease interactions are in limited amounts. In this paper, we...

Full description

Bibliographic Details
Main Authors: Chengkun Wu, Xinyi Xiao, Canqun Yang, JinXiang Chen, Jiacai Yi, Yanlong Qiu
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04346-7
id doaj-5126382b95f14defa421c0e00f08b7f0
record_format Article
spelling doaj-5126382b95f14defa421c0e00f08b7f02021-09-12T11:13:25ZengBMCBMC Bioinformatics1471-21052021-09-0122111510.1186/s12859-021-04346-7Mining microbe–disease interactions from literature via a transfer learning modelChengkun Wu0Xinyi Xiao1Canqun Yang2JinXiang Chen3Jiacai Yi4Yanlong Qiu5State Key Laboratory of High-Performance Computing, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyDepartment of General Surgery, Xiangya Hospital, Central South UniversityCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyAbstract Background Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe–disease interactions are hidden in the biomedical literature. The structured databases for microbe–disease interactions are in limited amounts. In this paper, we aim to construct a large-scale database for microbe–disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface that allows researchers to navigate and query required information. Results Firstly, we manually constructed a golden-standard corpus and a sliver-standard corpus (SSC) for microbe–disease interactions for curation. Moreover, we proposed a text mining framework for microbe–disease interaction extraction based on a pretrained model BERE. We applied named entity recognition tools to detect microbe and disease mentions from the free biomedical texts. After that, we fine-tuned the pretrained model BERE to recognize relations between targeted entities, which was originally built for drug–target interactions or drug–drug interactions. The introduction of SSC for model fine-tuning greatly improved detection performance for microbe–disease interactions, with an average reduction in error of approximately 10%. The MDIDB website offers data browsing, custom searching for specific diseases or microbes, and batch downloading. Conclusions Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average $$F_1$$ F 1 -score of 73.81%. For further validation, we randomly sampled nearly 1000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/https://doi.org/10.1186/s12859-021-04346-7Microbe–disease interactionsNamed-entity recognitionRelation extractionTransfer learning
collection DOAJ
language English
format Article
sources DOAJ
author Chengkun Wu
Xinyi Xiao
Canqun Yang
JinXiang Chen
Jiacai Yi
Yanlong Qiu
spellingShingle Chengkun Wu
Xinyi Xiao
Canqun Yang
JinXiang Chen
Jiacai Yi
Yanlong Qiu
Mining microbe–disease interactions from literature via a transfer learning model
BMC Bioinformatics
Microbe–disease interactions
Named-entity recognition
Relation extraction
Transfer learning
author_facet Chengkun Wu
Xinyi Xiao
Canqun Yang
JinXiang Chen
Jiacai Yi
Yanlong Qiu
author_sort Chengkun Wu
title Mining microbe–disease interactions from literature via a transfer learning model
title_short Mining microbe–disease interactions from literature via a transfer learning model
title_full Mining microbe–disease interactions from literature via a transfer learning model
title_fullStr Mining microbe–disease interactions from literature via a transfer learning model
title_full_unstemmed Mining microbe–disease interactions from literature via a transfer learning model
title_sort mining microbe–disease interactions from literature via a transfer learning model
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-09-01
description Abstract Background Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe–disease interactions are hidden in the biomedical literature. The structured databases for microbe–disease interactions are in limited amounts. In this paper, we aim to construct a large-scale database for microbe–disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface that allows researchers to navigate and query required information. Results Firstly, we manually constructed a golden-standard corpus and a sliver-standard corpus (SSC) for microbe–disease interactions for curation. Moreover, we proposed a text mining framework for microbe–disease interaction extraction based on a pretrained model BERE. We applied named entity recognition tools to detect microbe and disease mentions from the free biomedical texts. After that, we fine-tuned the pretrained model BERE to recognize relations between targeted entities, which was originally built for drug–target interactions or drug–drug interactions. The introduction of SSC for model fine-tuning greatly improved detection performance for microbe–disease interactions, with an average reduction in error of approximately 10%. The MDIDB website offers data browsing, custom searching for specific diseases or microbes, and batch downloading. Conclusions Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average $$F_1$$ F 1 -score of 73.81%. For further validation, we randomly sampled nearly 1000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/
topic Microbe–disease interactions
Named-entity recognition
Relation extraction
Transfer learning
url https://doi.org/10.1186/s12859-021-04346-7
work_keys_str_mv AT chengkunwu miningmicrobediseaseinteractionsfromliteratureviaatransferlearningmodel
AT xinyixiao miningmicrobediseaseinteractionsfromliteratureviaatransferlearningmodel
AT canqunyang miningmicrobediseaseinteractionsfromliteratureviaatransferlearningmodel
AT jinxiangchen miningmicrobediseaseinteractionsfromliteratureviaatransferlearningmodel
AT jiacaiyi miningmicrobediseaseinteractionsfromliteratureviaatransferlearningmodel
AT yanlongqiu miningmicrobediseaseinteractionsfromliteratureviaatransferlearningmodel
_version_ 1717755812542152704