The data-driven Bulgarian WordNet: BTBWN

The data-driven Bulgarian WordNet: BTBWN The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical res...

Full description

Bibliographic Details
Main Authors:	Petya Osenova, Kiril Simov
Format:	Article
Language:	English
Published:	Institute of Slavic Studies, Polish Academy of Sciences 2018-12-01
Series:	Cognitive Studies \| Études cognitives
Subjects:	Bulgarian WordNet WordNet mappings data-driven WordNet construction
Online Access:	https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1713

id	doaj-ce6246b817764d8fbf9eda82ce69775c
record_format	Article
spelling	doaj-ce6246b817764d8fbf9eda82ce69775c2020-11-24T21:52:39ZengInstitute of Slavic Studies, Polish Academy of SciencesCognitive Studies \| Études cognitives2392-23972018-12-0101810.11649/cs.17131328The data-driven Bulgarian WordNet: BTBWNPetya Osenova0Kiril Simov1Институт по информационни и комуникационни технологии, Българска академия на науките [Institute of Information and Communication Technologies, Bulgarian Academy of Sciences], София [Sofia]Институт по информационни и комуникационни технологии, Българска академия на науките [Institute of Information and Communication Technologies, Bulgarian Academy of Sciences], София [Sofia]The data-driven Bulgarian WordNet: BTBWN The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWN W artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1713Bulgarian WordNetWordNet mappingsdata-driven WordNet construction
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Petya Osenova Kiril Simov
spellingShingle	Petya Osenova Kiril Simov The data-driven Bulgarian WordNet: BTBWN Cognitive Studies \| Études cognitives Bulgarian WordNet WordNet mappings data-driven WordNet construction
author_facet	Petya Osenova Kiril Simov
author_sort	Petya Osenova
title	The data-driven Bulgarian WordNet: BTBWN
title_short	The data-driven Bulgarian WordNet: BTBWN
title_full	The data-driven Bulgarian WordNet: BTBWN
title_fullStr	The data-driven Bulgarian WordNet: BTBWN
title_full_unstemmed	The data-driven Bulgarian WordNet: BTBWN
title_sort	data-driven bulgarian wordnet: btbwn
publisher	Institute of Slavic Studies, Polish Academy of Sciences
series	Cognitive Studies \| Études cognitives
issn	2392-2397
publishDate	2018-12-01
description	The data-driven Bulgarian WordNet: BTBWN The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWN W artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.
topic	Bulgarian WordNet WordNet mappings data-driven WordNet construction
url	https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1713
work_keys_str_mv	AT petyaosenova thedatadrivenbulgarianwordnetbtbwn AT kirilsimov thedatadrivenbulgarianwordnetbtbwn AT petyaosenova datadrivenbulgarianwordnetbtbwn AT kirilsimov datadrivenbulgarianwordnetbtbwn
_version_	1725875505539842048

The data-driven Bulgarian WordNet: BTBWN

Similar Items