Multilingual Word Sense Disambiguation Using Wikipedia

Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. Word sense disambiguation is the task of automatically assigning the most appropriate meaning to...

Full description

Bibliographic Details
Main Author:	Dandala, Bharath
Other Authors:	Mihalcea, Rada, 1974-
Format:	Others
Language:	English
Published:	University of North Texas 2013
Subjects:	Wikipedia word sense disambiguation supervised learning multilingual
Online Access:	https://digital.library.unt.edu/ark:/67531/metadc500036/

id	ndltd-unt.edu-info-ark-67531-metadc500036
record_format	oai_dc
spelling	ndltd-unt.edu-info-ark-67531-metadc5000362020-07-15T07:09:31Z Multilingual Word Sense Disambiguation Using Wikipedia Dandala, Bharath Wikipedia word sense disambiguation supervised learning multilingual Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. Word sense disambiguation is the task of automatically assigning the most appropriate meaning to a polysemous word within a given context. Generally the problem of resolving ambiguity in literature has revolved around the famous quote “you shall know the meaning of the word by the company it keeps.” In this thesis, we investigate the role of context for resolving ambiguity through three different approaches. Instead of using a predefined monolingual sense inventory such as WordNet, we use a language-independent framework where the word senses and sense-tagged data are derived automatically from Wikipedia. Using Wikipedia as a source of sense-annotations provides the much needed solution for knowledge acquisition bottleneck. In order to evaluate the viability of Wikipedia based sense-annotations, we cast the task of disambiguating polysemous nouns as a monolingual classification task and experimented on lexical samples from four different languages (viz. English, German, Italian and Spanish). The experiments confirm that the Wikipedia based sense annotations are reliable and can be used to construct accurate monolingual sense classifiers. It is a long belief that exploiting multiple languages helps in building accurate word sense disambiguation systems. Subsequently, we developed two approaches that recast the task of disambiguating polysemous nouns as a multilingual classification task. The first approach for multilingual word sense disambiguation attempts to effectively use a machine translation system to leverage two relevant multilingual aspects of the semantics of text. First, the various senses of a target word may be translated into different words, which constitute unique, yet highly salient signal that effectively expand the target word’s feature space. Second, the translated context words themselves embed co-occurrence information that a translation engine gathers from very large parallel corpora. The second approach for multlingual word sense disambiguation attempts to reduce the reliance on the machine translation system during training by using the multilingual knowledge available in Wikipedia through its interlingual links. Finally, the experiments on a lexical sample from four different languages confirm that the multilingual systems perform better than the monolingual system and significantly improve the disambiguation accuracy. University of North Texas Mihalcea, Rada, 1974- Tarau, Paul Nielsen, Rodney Bunescu, Răzvan 2013-08 Thesis or Dissertation Text https://digital.library.unt.edu/ark:/67531/metadc500036/ ark: ark:/67531/metadc500036 English Public Dandala, Bharath Copyright Copyright is held by the author, unless otherwise noted. All rights Reserved.
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Wikipedia word sense disambiguation supervised learning multilingual
spellingShingle	Wikipedia word sense disambiguation supervised learning multilingual Dandala, Bharath Multilingual Word Sense Disambiguation Using Wikipedia
description	Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. Word sense disambiguation is the task of automatically assigning the most appropriate meaning to a polysemous word within a given context. Generally the problem of resolving ambiguity in literature has revolved around the famous quote “you shall know the meaning of the word by the company it keeps.” In this thesis, we investigate the role of context for resolving ambiguity through three different approaches. Instead of using a predefined monolingual sense inventory such as WordNet, we use a language-independent framework where the word senses and sense-tagged data are derived automatically from Wikipedia. Using Wikipedia as a source of sense-annotations provides the much needed solution for knowledge acquisition bottleneck. In order to evaluate the viability of Wikipedia based sense-annotations, we cast the task of disambiguating polysemous nouns as a monolingual classification task and experimented on lexical samples from four different languages (viz. English, German, Italian and Spanish). The experiments confirm that the Wikipedia based sense annotations are reliable and can be used to construct accurate monolingual sense classifiers. It is a long belief that exploiting multiple languages helps in building accurate word sense disambiguation systems. Subsequently, we developed two approaches that recast the task of disambiguating polysemous nouns as a multilingual classification task. The first approach for multilingual word sense disambiguation attempts to effectively use a machine translation system to leverage two relevant multilingual aspects of the semantics of text. First, the various senses of a target word may be translated into different words, which constitute unique, yet highly salient signal that effectively expand the target word’s feature space. Second, the translated context words themselves embed co-occurrence information that a translation engine gathers from very large parallel corpora. The second approach for multlingual word sense disambiguation attempts to reduce the reliance on the machine translation system during training by using the multilingual knowledge available in Wikipedia through its interlingual links. Finally, the experiments on a lexical sample from four different languages confirm that the multilingual systems perform better than the monolingual system and significantly improve the disambiguation accuracy.
author2	Mihalcea, Rada, 1974-
author_facet	Mihalcea, Rada, 1974- Dandala, Bharath
author	Dandala, Bharath
author_sort	Dandala, Bharath
title	Multilingual Word Sense Disambiguation Using Wikipedia
title_short	Multilingual Word Sense Disambiguation Using Wikipedia
title_full	Multilingual Word Sense Disambiguation Using Wikipedia
title_fullStr	Multilingual Word Sense Disambiguation Using Wikipedia
title_full_unstemmed	Multilingual Word Sense Disambiguation Using Wikipedia
title_sort	multilingual word sense disambiguation using wikipedia
publisher	University of North Texas
publishDate	2013
url	https://digital.library.unt.edu/ark:/67531/metadc500036/
work_keys_str_mv	AT dandalabharath multilingualwordsensedisambiguationusingwikipedia
_version_	1719328793149046784

Multilingual Word Sense Disambiguation Using Wikipedia

Similar Items