The importance of sampling frames in representative historical corpora : a case study of Parisian theater

Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a s...

Full description

Bibliographic Details
Main Author: Angus B. Grieve-Smith
Format: Article
Language:English
Published: Association Française de Linguistique Cognitive 2019-06-01
Series:CogniTextes
Subjects:
Online Access:http://journals.openedition.org/cognitextes/1671
id doaj-9362880801bd4c26b690e8657f393058
record_format Article
spelling doaj-9362880801bd4c26b690e8657f3930582020-11-25T02:43:27ZengAssociation Française de Linguistique CognitiveCogniTextes1958-53222019-06-011910.4000/cognitextes.1671The importance of sampling frames in representative historical corpora : a case study of Parisian theaterAngus B. Grieve-SmithCognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more.http://journals.openedition.org/cognitextes/1671Language changeFrench languagecorpus designsamplingusage-basedtype frequency
collection DOAJ
language English
format Article
sources DOAJ
author Angus B. Grieve-Smith
spellingShingle Angus B. Grieve-Smith
The importance of sampling frames in representative historical corpora : a case study of Parisian theater
CogniTextes
Language change
French language
corpus design
sampling
usage-based
type frequency
author_facet Angus B. Grieve-Smith
author_sort Angus B. Grieve-Smith
title The importance of sampling frames in representative historical corpora : a case study of Parisian theater
title_short The importance of sampling frames in representative historical corpora : a case study of Parisian theater
title_full The importance of sampling frames in representative historical corpora : a case study of Parisian theater
title_fullStr The importance of sampling frames in representative historical corpora : a case study of Parisian theater
title_full_unstemmed The importance of sampling frames in representative historical corpora : a case study of Parisian theater
title_sort importance of sampling frames in representative historical corpora : a case study of parisian theater
publisher Association Française de Linguistique Cognitive
series CogniTextes
issn 1958-5322
publishDate 2019-06-01
description Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more.
topic Language change
French language
corpus design
sampling
usage-based
type frequency
url http://journals.openedition.org/cognitextes/1671
work_keys_str_mv AT angusbgrievesmith theimportanceofsamplingframesinrepresentativehistoricalcorporaacasestudyofparisiantheater
AT angusbgrievesmith importanceofsamplingframesinrepresentativehistoricalcorporaacasestudyofparisiantheater
_version_ 1724769230868447232