Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications
Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expa...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2013-01-01
|
Series: | Frontiers in Physiology |
Subjects: | |
Online Access: | http://journal.frontiersin.org/Journal/10.3389/fphys.2013.00008/full |
id |
doaj-1e633ebc52024c9199a56ba0ad3e37da |
---|---|
record_format |
Article |
spelling |
doaj-1e633ebc52024c9199a56ba0ad3e37da2020-11-24T20:57:55ZengFrontiers Media S.A.Frontiers in Physiology1664-042X2013-01-01410.3389/fphys.2013.0000838549Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical ApplicationsHongyu eChen0Bronwen eMartin1Caitlin M Daimon2Stuart eMaudsley3National Institute on Aging - National Institutes of HealthNational Institute on Aging - National Institutes of HealthNational Institute on Aging - National Institutes of HealthNational Institute on Aging - National Institutes of HealthText mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.http://journal.frontiersin.org/Journal/10.3389/fphys.2013.00008/fullData MiningDrug Discoverydiscoverycomputational linguisticsmolecular interactionsLatent Semantic Indexing |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hongyu eChen Bronwen eMartin Caitlin M Daimon Stuart eMaudsley |
spellingShingle |
Hongyu eChen Bronwen eMartin Caitlin M Daimon Stuart eMaudsley Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications Frontiers in Physiology Data Mining Drug Discovery discovery computational linguistics molecular interactions Latent Semantic Indexing |
author_facet |
Hongyu eChen Bronwen eMartin Caitlin M Daimon Stuart eMaudsley |
author_sort |
Hongyu eChen |
title |
Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications |
title_short |
Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications |
title_full |
Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications |
title_fullStr |
Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications |
title_full_unstemmed |
Effective use of Latent Semantic Indexing and Computational Linguistics in Biological and Biomedical Applications |
title_sort |
effective use of latent semantic indexing and computational linguistics in biological and biomedical applications |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Physiology |
issn |
1664-042X |
publishDate |
2013-01-01 |
description |
Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data. |
topic |
Data Mining Drug Discovery discovery computational linguistics molecular interactions Latent Semantic Indexing |
url |
http://journal.frontiersin.org/Journal/10.3389/fphys.2013.00008/full |
work_keys_str_mv |
AT hongyuechen effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications AT bronwenemartin effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications AT caitlinmdaimon effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications AT stuartemaudsley effectiveuseoflatentsemanticindexingandcomputationallinguisticsinbiologicalandbiomedicalapplications |
_version_ |
1716787129497944064 |