LINNAEUS: A species name identification system for biomedical literature
<p>Abstract</p> <p>Background</p> <p>The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document re...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2010-02-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/11/85 |
id |
doaj-4526a1fd91774d40971cb6d8679a98f3 |
---|---|
record_format |
Article |
spelling |
doaj-4526a1fd91774d40971cb6d8679a98f32020-11-24T20:54:14ZengBMCBMC Bioinformatics1471-21052010-02-011118510.1186/1471-2105-11-85LINNAEUS: A species name identification system for biomedical literatureNenadic GoranGerner MartinBergman Casey M<p>Abstract</p> <p>Background</p> <p>The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.</p> <p>Results</p> <p>In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.</p> <p>Conclusions</p> <p>LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at <url>http://linnaeus.sourceforge.net/</url>.</p> http://www.biomedcentral.com/1471-2105/11/85 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nenadic Goran Gerner Martin Bergman Casey M |
spellingShingle |
Nenadic Goran Gerner Martin Bergman Casey M LINNAEUS: A species name identification system for biomedical literature BMC Bioinformatics |
author_facet |
Nenadic Goran Gerner Martin Bergman Casey M |
author_sort |
Nenadic Goran |
title |
LINNAEUS: A species name identification system for biomedical literature |
title_short |
LINNAEUS: A species name identification system for biomedical literature |
title_full |
LINNAEUS: A species name identification system for biomedical literature |
title_fullStr |
LINNAEUS: A species name identification system for biomedical literature |
title_full_unstemmed |
LINNAEUS: A species name identification system for biomedical literature |
title_sort |
linnaeus: a species name identification system for biomedical literature |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2010-02-01 |
description |
<p>Abstract</p> <p>Background</p> <p>The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.</p> <p>Results</p> <p>In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.</p> <p>Conclusions</p> <p>LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at <url>http://linnaeus.sourceforge.net/</url>.</p> |
url |
http://www.biomedcentral.com/1471-2105/11/85 |
work_keys_str_mv |
AT nenadicgoran linnaeusaspeciesnameidentificationsystemforbiomedicalliterature AT gernermartin linnaeusaspeciesnameidentificationsystemforbiomedicalliterature AT bergmancaseym linnaeusaspeciesnameidentificationsystemforbiomedicalliterature |
_version_ |
1716795195144536064 |