Studying text coherence in Czech – a corpus-based analysis

The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czec...

Full description

Bibliographic Details
Main Author: Rysová Magdaléna
Format: Article
Language:English
Published: Sciendo 2017-12-01
Series:Topics in Linguistics
Subjects:
Online Access:https://doi.org/10.1515/topling-2017-0009
id doaj-4164e14b80a04a028ca3e18bcb5ab858
record_format Article
spelling doaj-4164e14b80a04a028ca3e18bcb5ab8582021-09-05T20:51:32ZengSciendoTopics in Linguistics1337-75902199-65042017-12-01182364710.1515/topling-2017-0009topling-2017-0009Studying text coherence in Czech – a corpus-based analysisRysová Magdaléna0Department of English, Faculty of International Relations, University of Economics, W. Churchill Sq. 4, Prague, Czech RepublicThe paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).https://doi.org/10.1515/topling-2017-0009sentence information structurecoreferencecorpus analysisczech
collection DOAJ
language English
format Article
sources DOAJ
author Rysová Magdaléna
spellingShingle Rysová Magdaléna
Studying text coherence in Czech – a corpus-based analysis
Topics in Linguistics
sentence information structure
coreference
corpus analysis
czech
author_facet Rysová Magdaléna
author_sort Rysová Magdaléna
title Studying text coherence in Czech – a corpus-based analysis
title_short Studying text coherence in Czech – a corpus-based analysis
title_full Studying text coherence in Czech – a corpus-based analysis
title_fullStr Studying text coherence in Czech – a corpus-based analysis
title_full_unstemmed Studying text coherence in Czech – a corpus-based analysis
title_sort studying text coherence in czech – a corpus-based analysis
publisher Sciendo
series Topics in Linguistics
issn 1337-7590
2199-6504
publishDate 2017-12-01
description The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).
topic sentence information structure
coreference
corpus analysis
czech
url https://doi.org/10.1515/topling-2017-0009
work_keys_str_mv AT rysovamagdalena studyingtextcoherenceinczechacorpusbasedanalysis
_version_ 1717783630909014016