Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those intere...

Full description

Bibliographic Details
Main Author: Hugo Gonçalo Oliveira
Format: Article
Language:English
Published: MDPI AG 2018-02-01
Series:Information
Subjects:
Online Access:http://www.mdpi.com/2078-2489/9/2/35
id doaj-81693e5dec8e4a528299a2d1ef98100c
record_format Article
spelling doaj-81693e5dec8e4a528299a2d1ef98100c2020-11-24T23:07:39ZengMDPI AGInformation2078-24892018-02-01923510.3390/info9020035info9020035Distributional and Knowledge-Based Approaches for Computing Portuguese Word SimilarityHugo Gonçalo Oliveira0Centre for Informatics and Systems of the University of Coimbra (CISUC), Department of Informatics Engineering, University of Coimbra, 3030-290 Coimbra, PortugalIdentifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.http://www.mdpi.com/2078-2489/9/2/35semantic similarityword similaritylexical knowledge baseslexical semanticsword embeddingsdistributional semantics
collection DOAJ
language English
format Article
sources DOAJ
author Hugo Gonçalo Oliveira
spellingShingle Hugo Gonçalo Oliveira
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
Information
semantic similarity
word similarity
lexical knowledge bases
lexical semantics
word embeddings
distributional semantics
author_facet Hugo Gonçalo Oliveira
author_sort Hugo Gonçalo Oliveira
title Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_short Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_fullStr Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full_unstemmed Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_sort distributional and knowledge-based approaches for computing portuguese word similarity
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2018-02-01
description Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.
topic semantic similarity
word similarity
lexical knowledge bases
lexical semantics
word embeddings
distributional semantics
url http://www.mdpi.com/2078-2489/9/2/35
work_keys_str_mv AT hugogoncalooliveira distributionalandknowledgebasedapproachesforcomputingportuguesewordsimilarity
_version_ 1725617807396175872