Application of TF-IDF factor in the semantic analysis of a documentary collection
<strong>Objective</strong>. This paper describes the application of a tool for the semantic analysis of a document collection based on the use of term frequency–inverse document frequency (TF – IDF). <strong>Methodology</strong>. A system based on PHP and MySQL database for t...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | Spanish |
Published: |
University Library System, University of Pittsburgh
2015-11-01
|
Series: | Biblios |
Subjects: | |
Online Access: | http://biblios.pitt.edu/ojs/index.php/biblios/article/view/227 |
id |
doaj-d1069febd04641b78766ca7c11ade8e7 |
---|---|
record_format |
Article |
spelling |
doaj-d1069febd04641b78766ca7c11ade8e72020-11-25T01:08:02ZspaUniversity Library System, University of PittsburghBiblios1562-47302015-11-0106011310.5195/biblios.2015.227148Application of TF-IDF factor in the semantic analysis of a documentary collectionAndrés Vuotto0Celeste Bogetti1Gladys Fernández2Universidad Nacional de Mar del Plata - MDPUniversidad Nacional de Mar del Plata - MDPUniversidad Nacional de Mar del Plata - MDP<strong>Objective</strong>. This paper describes the application of a tool for the semantic analysis of a document collection based on the use of term frequency–inverse document frequency (TF – IDF). <strong>Methodology</strong>. A system based on PHP and MySQL database for the management of a thesaurus, the calculation of TF – IDF (as an indicator of semantic weight) and for development a relevance tree (consisting of those concepts is developed most relevant issue analyzed). The tool was tested to the semantic analysis of a documentary collection of Psychology. <strong>Results</strong>. The system was able to identify the level of track presence: professional ethics, in a collection of documents Psychology program. <strong>Conclusions</strong>. The experience described confirms the viability of the tool for the semantic analysis of a documentary collection. It underlines the relevance and capacities of information professionals to develop this kind of tools for processing information. The authors suggests a special technical approach for use of scripts and information flows.http://biblios.pitt.edu/ojs/index.php/biblios/article/view/227Análisis semánticoTF-IDFRecuperación de informaciónMinería de datosExtracción de información en bases de datos |
collection |
DOAJ |
language |
Spanish |
format |
Article |
sources |
DOAJ |
author |
Andrés Vuotto Celeste Bogetti Gladys Fernández |
spellingShingle |
Andrés Vuotto Celeste Bogetti Gladys Fernández Application of TF-IDF factor in the semantic analysis of a documentary collection Biblios Análisis semántico TF-IDF Recuperación de información Minería de datos Extracción de información en bases de datos |
author_facet |
Andrés Vuotto Celeste Bogetti Gladys Fernández |
author_sort |
Andrés Vuotto |
title |
Application of TF-IDF factor in the semantic analysis of a documentary collection |
title_short |
Application of TF-IDF factor in the semantic analysis of a documentary collection |
title_full |
Application of TF-IDF factor in the semantic analysis of a documentary collection |
title_fullStr |
Application of TF-IDF factor in the semantic analysis of a documentary collection |
title_full_unstemmed |
Application of TF-IDF factor in the semantic analysis of a documentary collection |
title_sort |
application of tf-idf factor in the semantic analysis of a documentary collection |
publisher |
University Library System, University of Pittsburgh |
series |
Biblios |
issn |
1562-4730 |
publishDate |
2015-11-01 |
description |
<strong>Objective</strong>. This paper describes the application of a tool for the semantic analysis of a document collection based on the use of term frequency–inverse document frequency (TF – IDF). <strong>Methodology</strong>. A system based on PHP and MySQL database for the management of a thesaurus, the calculation of TF – IDF (as an indicator of semantic weight) and for development a relevance tree (consisting of those concepts is developed most relevant issue analyzed). The tool was tested to the semantic analysis of a documentary collection of Psychology. <strong>Results</strong>. The system was able to identify the level of track presence: professional ethics, in a collection of documents Psychology program. <strong>Conclusions</strong>. The experience described confirms the viability of the tool for the semantic analysis of a documentary collection. It underlines the relevance and capacities of information professionals to develop this kind of tools for processing information. The authors suggests a special technical approach for use of scripts and information flows. |
topic |
Análisis semántico TF-IDF Recuperación de información Minería de datos Extracción de información en bases de datos |
url |
http://biblios.pitt.edu/ojs/index.php/biblios/article/view/227 |
work_keys_str_mv |
AT andresvuotto applicationoftfidffactorinthesemanticanalysisofadocumentarycollection AT celestebogetti applicationoftfidffactorinthesemanticanalysisofadocumentarycollection AT gladysfernandez applicationoftfidffactorinthesemanticanalysisofadocumentarycollection |
_version_ |
1725184707028582400 |