The importance of sampling frames in representative historical corpora : a case study of Parisian theater
Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a s...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Association Française de Linguistique Cognitive
2019-06-01
|
Series: | CogniTextes |
Subjects: | |
Online Access: | http://journals.openedition.org/cognitextes/1671 |
id |
doaj-9362880801bd4c26b690e8657f393058 |
---|---|
record_format |
Article |
spelling |
doaj-9362880801bd4c26b690e8657f3930582020-11-25T02:43:27ZengAssociation Française de Linguistique CognitiveCogniTextes1958-53222019-06-011910.4000/cognitextes.1671The importance of sampling frames in representative historical corpora : a case study of Parisian theaterAngus B. Grieve-SmithCognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more.http://journals.openedition.org/cognitextes/1671Language changeFrench languagecorpus designsamplingusage-basedtype frequency |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Angus B. Grieve-Smith |
spellingShingle |
Angus B. Grieve-Smith The importance of sampling frames in representative historical corpora : a case study of Parisian theater CogniTextes Language change French language corpus design sampling usage-based type frequency |
author_facet |
Angus B. Grieve-Smith |
author_sort |
Angus B. Grieve-Smith |
title |
The importance of sampling frames in representative historical corpora : a case study of Parisian theater |
title_short |
The importance of sampling frames in representative historical corpora : a case study of Parisian theater |
title_full |
The importance of sampling frames in representative historical corpora : a case study of Parisian theater |
title_fullStr |
The importance of sampling frames in representative historical corpora : a case study of Parisian theater |
title_full_unstemmed |
The importance of sampling frames in representative historical corpora : a case study of Parisian theater |
title_sort |
importance of sampling frames in representative historical corpora : a case study of parisian theater |
publisher |
Association Française de Linguistique Cognitive |
series |
CogniTextes |
issn |
1958-5322 |
publishDate |
2019-06-01 |
description |
Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more. |
topic |
Language change French language corpus design sampling usage-based type frequency |
url |
http://journals.openedition.org/cognitextes/1671 |
work_keys_str_mv |
AT angusbgrievesmith theimportanceofsamplingframesinrepresentativehistoricalcorporaacasestudyofparisiantheater AT angusbgrievesmith importanceofsamplingframesinrepresentativehistoricalcorporaacasestudyofparisiantheater |
_version_ |
1724769230868447232 |