Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those intere...

Full description

Bibliographic Details
Main Author:	Hugo Gonçalo Oliveira
Format:	Article
Language:	English
Published:	MDPI AG 2018-02-01
Series:	Information
Subjects:	semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics
Online Access:	http://www.mdpi.com/2078-2489/9/2/35

id	doaj-81693e5dec8e4a528299a2d1ef98100c
record_format	Article
spelling	doaj-81693e5dec8e4a528299a2d1ef98100c2020-11-24T23:07:39ZengMDPI AGInformation2078-24892018-02-01923510.3390/info9020035info9020035Distributional and Knowledge-Based Approaches for Computing Portuguese Word SimilarityHugo Gonçalo Oliveira0Centre for Informatics and Systems of the University of Coimbra (CISUC), Department of Informatics Engineering, University of Coimbra, 3030-290 Coimbra, PortugalIdentifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.http://www.mdpi.com/2078-2489/9/2/35semantic similarityword similaritylexical knowledge baseslexical semanticsword embeddingsdistributional semantics
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hugo Gonçalo Oliveira
spellingShingle	Hugo Gonçalo Oliveira Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity Information semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics
author_facet	Hugo Gonçalo Oliveira
author_sort	Hugo Gonçalo Oliveira
title	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_short	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_fullStr	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full_unstemmed	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_sort	distributional and knowledge-based approaches for computing portuguese word similarity
publisher	MDPI AG
series	Information
issn	2078-2489
publishDate	2018-02-01
description	Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.
topic	semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics
url	http://www.mdpi.com/2078-2489/9/2/35
work_keys_str_mv	AT hugogoncalooliveira distributionalandknowledgebasedapproachesforcomputingportuguesewordsimilarity
_version_	1725617807396175872

Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Similar Items