Application of TF-IDF factor in the semantic analysis of a documentary collection

<strong>Objective</strong>. This paper describes the application of a tool for the semantic analysis of a document collection based on the use of term frequency–inverse document frequency (TF – IDF). <strong>Methodology</strong>. A system based on PHP and MySQL database for t...

Full description

Bibliographic Details
Main Authors: Andrés Vuotto, Celeste Bogetti, Gladys Fernández
Format: Article
Language:Spanish
Published: University Library System, University of Pittsburgh 2015-11-01
Series:Biblios
Subjects:
Online Access:http://biblios.pitt.edu/ojs/index.php/biblios/article/view/227
id doaj-d1069febd04641b78766ca7c11ade8e7
record_format Article
spelling doaj-d1069febd04641b78766ca7c11ade8e72020-11-25T01:08:02ZspaUniversity Library System, University of PittsburghBiblios1562-47302015-11-0106011310.5195/biblios.2015.227148Application of TF-IDF factor in the semantic analysis of a documentary collectionAndrés Vuotto0Celeste Bogetti1Gladys Fernández2Universidad Nacional de Mar del Plata - MDPUniversidad Nacional de Mar del Plata - MDPUniversidad Nacional de Mar del Plata - MDP<strong>Objective</strong>. This paper describes the application of a tool for the semantic analysis of a document collection based on the use of term frequency–inverse document frequency (TF – IDF). <strong>Methodology</strong>. A system based on PHP and MySQL database for the management of a thesaurus, the calculation of TF – IDF (as an indicator of semantic weight) and for development a relevance tree (consisting of those concepts is developed most relevant issue analyzed). The tool was tested to the semantic analysis of a documentary collection of Psychology. <strong>Results</strong>. The system was able to identify the level of track presence: professional ethics, in a collection of documents Psychology program. <strong>Conclusions</strong>. The experience described confirms the viability of the tool for the semantic analysis of a documentary collection. It underlines the relevance and capacities of information professionals to develop this kind of tools for processing information. The authors suggests a special technical approach for use of scripts and information flows.http://biblios.pitt.edu/ojs/index.php/biblios/article/view/227Análisis semánticoTF-IDFRecuperación de informaciónMinería de datosExtracción de información en bases de datos
collection DOAJ
language Spanish
format Article
sources DOAJ
author Andrés Vuotto
Celeste Bogetti
Gladys Fernández
spellingShingle Andrés Vuotto
Celeste Bogetti
Gladys Fernández
Application of TF-IDF factor in the semantic analysis of a documentary collection
Biblios
Análisis semántico
TF-IDF
Recuperación de información
Minería de datos
Extracción de información en bases de datos
author_facet Andrés Vuotto
Celeste Bogetti
Gladys Fernández
author_sort Andrés Vuotto
title Application of TF-IDF factor in the semantic analysis of a documentary collection
title_short Application of TF-IDF factor in the semantic analysis of a documentary collection
title_full Application of TF-IDF factor in the semantic analysis of a documentary collection
title_fullStr Application of TF-IDF factor in the semantic analysis of a documentary collection
title_full_unstemmed Application of TF-IDF factor in the semantic analysis of a documentary collection
title_sort application of tf-idf factor in the semantic analysis of a documentary collection
publisher University Library System, University of Pittsburgh
series Biblios
issn 1562-4730
publishDate 2015-11-01
description <strong>Objective</strong>. This paper describes the application of a tool for the semantic analysis of a document collection based on the use of term frequency–inverse document frequency (TF – IDF). <strong>Methodology</strong>. A system based on PHP and MySQL database for the management of a thesaurus, the calculation of TF – IDF (as an indicator of semantic weight) and for development a relevance tree (consisting of those concepts is developed most relevant issue analyzed). The tool was tested to the semantic analysis of a documentary collection of Psychology. <strong>Results</strong>. The system was able to identify the level of track presence: professional ethics, in a collection of documents Psychology program. <strong>Conclusions</strong>. The experience described confirms the viability of the tool for the semantic analysis of a documentary collection. It underlines the relevance and capacities of information professionals to develop this kind of tools for processing information. The authors suggests a special technical approach for use of scripts and information flows.
topic Análisis semántico
TF-IDF
Recuperación de información
Minería de datos
Extracción de información en bases de datos
url http://biblios.pitt.edu/ojs/index.php/biblios/article/view/227
work_keys_str_mv AT andresvuotto applicationoftfidffactorinthesemanticanalysisofadocumentarycollection
AT celestebogetti applicationoftfidffactorinthesemanticanalysisofadocumentarycollection
AT gladysfernandez applicationoftfidffactorinthesemanticanalysisofadocumentarycollection
_version_ 1725184707028582400