Víceslovné lexémy v syntaktickém kontextu

We start with the assumption that (i) a corpus represents the use of language, i.e. linguistic performance, (ii) a rule-based grammar represents language as a system, i.e. linguistic competence, and (iii) corpus annotation represents the interface between the two. To detect and diagnose mismatches b...

Full description

Bibliographic Details
Main Authors:	Alexandr Rosen, Hana Skoumalová, Jiří Znamenáček
Format:	Article
Language:	ces
Published:	Univerzita Karlova, Filozofická fakulta 2020-11-01
Series:	Studie z Aplikované Lingvistiky
Subjects:	czech hpsg syntax treebank multi-word expressions
Online Access:	https://studiezaplikovanelingvistiky.ff.cuni.cz/wp-content/uploads/sites/19/2020/11/Alexandr_Rosen_-_Hana_Skoumalova_-_Jiri_Znamenacek_63-84.pdf

id	doaj-689bdf012b4a448c8d7d2320d90a0775
record_format	Article
spelling	doaj-689bdf012b4a448c8d7d2320d90a07752020-11-25T04:08:08ZcesUniverzita Karlova, Filozofická fakultaStudie z Aplikované Lingvistiky1804-32402336-67022020-11-011126384Víceslovné lexémy v syntaktickém kontextuAlexandr Rosen 0Hana Skoumalová1Jiří Znamenáček 2Ústav teoretické a komputační lingvistiky FF UKÚstav teoretické a komputační lingvistiky FF UKÚstav informatiky a chemie, Vysoká škola chemicko-technologická v PrazeWe start with the assumption that (i) a corpus represents the use of language, i.e. linguistic performance, (ii) a rule-based grammar represents language as a system, i.e. linguistic competence, and (iii) corpus annotation represents the interface between the two. To detect and diagnose mismatches between the language use and the language system we use a constraint-based grammar run as a constraint solver on texts tagged and dependency-parsed by stochastic tools. The texts also have MWEs (multi-word expressions) identified and transformed into a constituency-based format before the grammar is applied. We describe the role and results of the grammar, and its use to check texts annotated with morphosyntactic categories, syntactic structure and information about the status of relevant expressions as MWEs. The grammar also employs lexical resources such as a valency lexicon and a database of MWEs to make the checking more accurate and the annotation more informative. The results are represented as typed feature structures where MWE-related information can be shared by lexical and phrasal nodes. This allows for the annotation of MWEs as lexical units, independently of their analysis in terms of syntactic structure. Focusing on the interplay of MWEs with their syntactic context we analyse a number of representative examples, pointing out the pros and cons of specific solutions and the whole approach.https://studiezaplikovanelingvistiky.ff.cuni.cz/wp-content/uploads/sites/19/2020/11/Alexandr_Rosen_-_Hana_Skoumalova_-_Jiri_Znamenacek_63-84.pdfczechhpsgsyntaxtreebankmulti-word expressions
collection	DOAJ
language	ces
format	Article
sources	DOAJ
author	Alexandr Rosen Hana Skoumalová Jiří Znamenáček
spellingShingle	Alexandr Rosen Hana Skoumalová Jiří Znamenáček Víceslovné lexémy v syntaktickém kontextu Studie z Aplikované Lingvistiky czech hpsg syntax treebank multi-word expressions
author_facet	Alexandr Rosen Hana Skoumalová Jiří Znamenáček
author_sort	Alexandr Rosen
title	Víceslovné lexémy v syntaktickém kontextu
title_short	Víceslovné lexémy v syntaktickém kontextu
title_full	Víceslovné lexémy v syntaktickém kontextu
title_fullStr	Víceslovné lexémy v syntaktickém kontextu
title_full_unstemmed	Víceslovné lexémy v syntaktickém kontextu
title_sort	víceslovné lexémy v syntaktickém kontextu
publisher	Univerzita Karlova, Filozofická fakulta
series	Studie z Aplikované Lingvistiky
issn	1804-3240 2336-6702
publishDate	2020-11-01
description	We start with the assumption that (i) a corpus represents the use of language, i.e. linguistic performance, (ii) a rule-based grammar represents language as a system, i.e. linguistic competence, and (iii) corpus annotation represents the interface between the two. To detect and diagnose mismatches between the language use and the language system we use a constraint-based grammar run as a constraint solver on texts tagged and dependency-parsed by stochastic tools. The texts also have MWEs (multi-word expressions) identified and transformed into a constituency-based format before the grammar is applied. We describe the role and results of the grammar, and its use to check texts annotated with morphosyntactic categories, syntactic structure and information about the status of relevant expressions as MWEs. The grammar also employs lexical resources such as a valency lexicon and a database of MWEs to make the checking more accurate and the annotation more informative. The results are represented as typed feature structures where MWE-related information can be shared by lexical and phrasal nodes. This allows for the annotation of MWEs as lexical units, independently of their analysis in terms of syntactic structure. Focusing on the interplay of MWEs with their syntactic context we analyse a number of representative examples, pointing out the pros and cons of specific solutions and the whole approach.
topic	czech hpsg syntax treebank multi-word expressions
url	https://studiezaplikovanelingvistiky.ff.cuni.cz/wp-content/uploads/sites/19/2020/11/Alexandr_Rosen_-_Hana_Skoumalova_-_Jiri_Znamenacek_63-84.pdf
work_keys_str_mv	AT alexandrrosen viceslovnelexemyvsyntaktickemkontextu AT hanaskoumalova viceslovnelexemyvsyntaktickemkontextu AT jiriznamenacek viceslovnelexemyvsyntaktickemkontextu
_version_	1724426666801889280

Víceslovné lexémy v syntaktickém kontextu

Similar Items