Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013

The paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and practical solutions adopted during the data collection. The corpus is designed to represent contemporaryspontaneous spoken language used in informal, real-life situations across the whole of the Czech...

Full description

Bibliographic Details
Main Authors:	Benešová, Lucie, Křen, Michal, Waclawičová, Martina
Format:	Article
Language:	ces
Published:	Univerzita Karlova, Filozofická fakulta 2015-10-01
Series:	Časopis pro Moderní Filologii
Subjects:	language corpus corpus design spontaneous spoken language Czech transcription
Online Access:	http://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2015/07/Lucie-Beneseva_-Michal-Kren_42-50.pdf

id	doaj-1b2201ef7bb047a1a3b7d8941a558598
record_format	Article
spelling	doaj-1b2201ef7bb047a1a3b7d8941a5585982020-11-25T01:12:19ZcesUniverzita Karlova, Filozofická fakultaČasopis pro Moderní Filologii0008-73862336-65912015-10-019714250 Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013 Benešová, Lucie 0Křen, Michal1Waclawičová, Martina2Ústav Českého národního korpusu, FFUK \| nám. J. Palacha 2, 116 38 Praha 1 , lucie.benesova@ff.cuni.cz Ústav Českého národního korpusu, FFUK \| nám. J. Palacha 2, 116 38 Praha 1 , michal.kren@ff.cuni.cz Ústav Českého národního korpusu, FFUK \| nám. J. Palacha 2, 116 38 Praha 1 , martina.waclawicova@ff.cuni.cz The paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and practical solutions adopted during the data collection. The corpus is designed to represent contemporaryspontaneous spoken language used in informal, real-life situations across the whole of the Czech Republic. The corpus consists of audio recordings and their transcriptions aligned with time stamps; it features manual annotation and broad regional coverage with a large variety of speakers. ORAL2013 contains 835 recordings from the period 2008 to 2011 made with 2,544 speakers (of whom 1,297 speakers are unique); the total length of the audio tracks is almost 300 hours and the total size of the transcriptions exceeds 3.28 million tokens. ORAL2013 is made publicly available by the Czech National Corpus at http://www.korpus.cz/.http://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2015/07/Lucie-Beneseva_-Michal-Kren_42-50.pdflanguage corpuscorpus designspontaneous spoken languageCzechtranscription
collection	DOAJ
language	ces
format	Article
sources	DOAJ
author	Benešová, Lucie Křen, Michal Waclawičová, Martina
spellingShingle	Benešová, Lucie Křen, Michal Waclawičová, Martina Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013 Časopis pro Moderní Filologii language corpus corpus design spontaneous spoken language Czech transcription
author_facet	Benešová, Lucie Křen, Michal Waclawičová, Martina
author_sort	Benešová, Lucie
title	Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013
title_short	Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013
title_full	Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013
title_fullStr	Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013
title_full_unstemmed	Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013
title_sort	korpus spontánní mluvené češtiny oral2013 : the corpus of spontaneous spoken czech oral 2013
publisher	Univerzita Karlova, Filozofická fakulta
series	Časopis pro Moderní Filologii
issn	0008-7386 2336-6591
publishDate	2015-10-01
description	The paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and practical solutions adopted during the data collection. The corpus is designed to represent contemporaryspontaneous spoken language used in informal, real-life situations across the whole of the Czech Republic. The corpus consists of audio recordings and their transcriptions aligned with time stamps; it features manual annotation and broad regional coverage with a large variety of speakers. ORAL2013 contains 835 recordings from the period 2008 to 2011 made with 2,544 speakers (of whom 1,297 speakers are unique); the total length of the audio tracks is almost 300 hours and the total size of the transcriptions exceeds 3.28 million tokens. ORAL2013 is made publicly available by the Czech National Corpus at http://www.korpus.cz/.
topic	language corpus corpus design spontaneous spoken language Czech transcription
url	http://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2015/07/Lucie-Beneseva_-Michal-Kren_42-50.pdf
work_keys_str_mv	AT benesovalucie korpusspontannimluvenecestinyoral2013thecorpusofspontaneousspokenczechoral2013 AT krenmichal korpusspontannimluvenecestinyoral2013thecorpusofspontaneousspokenczechoral2013 AT waclawicovamartina korpusspontannimluvenecestinyoral2013thecorpusofspontaneousspokenczechoral2013
_version_	1725167100832514048

Korpus spontánní mluvené češtiny ORAL2013 : The Corpus of Spontaneous Spoken Czech ORAL 2013

Similar Items