Integrating text-mining approaches to identify entities and extract events from the biomedical literature

The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured b...

Full description

Bibliographic Details
Main Author: Gerner, Lars Martin Anders
Other Authors: Bergman, Casey; Nenadic, Goran
Published: University of Manchester 2012
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553476
id ndltd-bl.uk-oai-ethos.bl.uk-553476
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5534762017-07-25T03:24:58ZIntegrating text-mining approaches to identify entities and extract events from the biomedical literatureGerner, Lars Martin AndersBergman, Casey; Nenadic, Goran2012The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured biomedical text. This thesis reports the development of four text-mining systems that, by building on each other, has enabled the extraction of information about a large number of published statements in the biomedical literature. The first system, LINNAEUS, enables highly accurate detection ('recognition') and identification ('normalization') of species names in biomedical articles. Building on LINNAEUS, we implemented a range of improvements in the GNAT system, enabling high-throughput gene/protein detection and identification. Using gene/protein identification from GNAT, we developed the Gene Expression Text Miner (GETM), which extracts information about gene expression statements. Finally, building on GETM as a pilot project, we constructed the BioContext integrated event extraction system, which was used to extract information about over 11 million distinct biomolecular processes in 10.9 million abstracts and 230,000 full-text articles. The ability to detect negated statements in the BioContext system enables the preliminary analysis of potential contradictions in the biomedical literature. All tools (LINNAEUS, GNAT, GETM, and BioContext) are available under open-source software licenses, and LINNAEUS and GNAT are available as online web-services. All extracted data (36 million BioContext statements, 720,000 GETM statements, 72,000 contradictions, 37 million mentions of species names, 80 million mentions of gene names, and 57 million mentions of anatomical location names) is available for bulk download. In addition, the data extracted by GETM and BioContext is also available to biologists through easy-to-use search interfaces.006.312Biomedical text miningUniversity of Manchesterhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553476https://www.research.manchester.ac.uk/portal/en/theses/integrating-textmining-approaches-to-identify-entities-and-extract-events-from-the-biomedical-literature(44f8e79a-3782-4687-85c7-eee1fda5cb76).htmlElectronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.312
Biomedical text mining
spellingShingle 006.312
Biomedical text mining
Gerner, Lars Martin Anders
Integrating text-mining approaches to identify entities and extract events from the biomedical literature
description The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured biomedical text. This thesis reports the development of four text-mining systems that, by building on each other, has enabled the extraction of information about a large number of published statements in the biomedical literature. The first system, LINNAEUS, enables highly accurate detection ('recognition') and identification ('normalization') of species names in biomedical articles. Building on LINNAEUS, we implemented a range of improvements in the GNAT system, enabling high-throughput gene/protein detection and identification. Using gene/protein identification from GNAT, we developed the Gene Expression Text Miner (GETM), which extracts information about gene expression statements. Finally, building on GETM as a pilot project, we constructed the BioContext integrated event extraction system, which was used to extract information about over 11 million distinct biomolecular processes in 10.9 million abstracts and 230,000 full-text articles. The ability to detect negated statements in the BioContext system enables the preliminary analysis of potential contradictions in the biomedical literature. All tools (LINNAEUS, GNAT, GETM, and BioContext) are available under open-source software licenses, and LINNAEUS and GNAT are available as online web-services. All extracted data (36 million BioContext statements, 720,000 GETM statements, 72,000 contradictions, 37 million mentions of species names, 80 million mentions of gene names, and 57 million mentions of anatomical location names) is available for bulk download. In addition, the data extracted by GETM and BioContext is also available to biologists through easy-to-use search interfaces.
author2 Bergman, Casey; Nenadic, Goran
author_facet Bergman, Casey; Nenadic, Goran
Gerner, Lars Martin Anders
author Gerner, Lars Martin Anders
author_sort Gerner, Lars Martin Anders
title Integrating text-mining approaches to identify entities and extract events from the biomedical literature
title_short Integrating text-mining approaches to identify entities and extract events from the biomedical literature
title_full Integrating text-mining approaches to identify entities and extract events from the biomedical literature
title_fullStr Integrating text-mining approaches to identify entities and extract events from the biomedical literature
title_full_unstemmed Integrating text-mining approaches to identify entities and extract events from the biomedical literature
title_sort integrating text-mining approaches to identify entities and extract events from the biomedical literature
publisher University of Manchester
publishDate 2012
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553476
work_keys_str_mv AT gernerlarsmartinanders integratingtextminingapproachestoidentifyentitiesandextracteventsfromthebiomedicalliterature
_version_ 1718504656276029440