Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications

Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expa...

Full description

Bibliographic Details
Main Authors: Hongyu eChen, Bronwen eMartin, Caitlin M Daimon, Stuart eMaudsley
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-01-01
Series:Frontiers in Physiology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fphys.2013.00008/full
id doaj-1e633ebc52024c9199a56ba0ad3e37da
record_format Article
spelling doaj-1e633ebc52024c9199a56ba0ad3e37da2020-11-24T20:57:55ZengFrontiers Media S.A.Frontiers in Physiology1664-042X2013-01-01410.3389/fphys.2013.0000838549Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical ApplicationsHongyu eChen0Bronwen eMartin1Caitlin M Daimon2Stuart eMaudsley3National Institute on Aging - National Institutes of HealthNational Institute on Aging - National Institutes of HealthNational Institute on Aging - National Institutes of HealthNational Institute on Aging - National Institutes of HealthText mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.http://journal.frontiersin.org/Journal/10.3389/fphys.2013.00008/fullData MiningDrug Discoverydiscoverycomputational linguisticsmolecular interactionsLatent Semantic Indexing
collection DOAJ
language English
format Article
sources DOAJ
author Hongyu eChen
Bronwen eMartin
Caitlin M Daimon
Stuart eMaudsley
spellingShingle Hongyu eChen
Bronwen eMartin
Caitlin M Daimon
Stuart eMaudsley
Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
Frontiers in Physiology
Data Mining
Drug Discovery
discovery
computational linguistics
molecular interactions
Latent Semantic Indexing
author_facet Hongyu eChen
Bronwen eMartin
Caitlin M Daimon
Stuart eMaudsley
author_sort Hongyu eChen
title Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
title_short Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
title_full Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
title_fullStr Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
title_full_unstemmed Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
title_sort effective use of latent semantic indexing and computational linguistics in biological and biomedical applications
publisher Frontiers Media S.A.
series Frontiers in Physiology
issn 1664-042X
publishDate 2013-01-01
description Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.
topic Data Mining
Drug Discovery
discovery
computational linguistics
molecular interactions
Latent Semantic Indexing
url http://journal.frontiersin.org/Journal/10.3389/fphys.2013.00008/full
work_keys_str_mv AT hongyuechen effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications
AT bronwenemartin effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications
AT caitlinmdaimon effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications
AT stuartemaudsley effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications
_version_ 1716787129497944064