Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus

The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye move...

Full description

Bibliographic Details
Main Authors: Marisa Ferrara Boston, John Hale, Reinhold Kliegl, Umesh Patil, Shravan Vasishth
Format: Article
Language:English
Published: Bern Open Publishing 2008-09-01
Series:Journal of Eye Movement Research
Subjects:
Online Access:https://bop.unibe.ch/JEMR/article/view/2255
id doaj-f766613505ce45679535606bcea91a22
record_format Article
spelling doaj-f766613505ce45679535606bcea91a222021-05-28T13:34:55ZengBern Open PublishingJournal of Eye Movement Research1995-86922008-09-012110.16910/jemr.2.1.1Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence CorpusMarisa Ferrara Boston0John Hale1Reinhold Kliegl2Umesh Patil3Shravan Vasishth4Cornell UniversityCornell UniversityUniversity of PotsdamUniversity of PotsdamUniversity of PotsdamThe surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect of surprisal while taking into account unigram frequency and bigram frequency (transitional probability), word length, and empirically-derived word predictability; the socalled “early” and “late” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically non-significant effect on empirically-derived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of post-syntactic events may be difficult to uphold.https://bop.unibe.ch/JEMR/article/view/2255surprisalparsing costspotsdam sentence corpusparsing difficulty
collection DOAJ
language English
format Article
sources DOAJ
author Marisa Ferrara Boston
John Hale
Reinhold Kliegl
Umesh Patil
Shravan Vasishth
spellingShingle Marisa Ferrara Boston
John Hale
Reinhold Kliegl
Umesh Patil
Shravan Vasishth
Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
Journal of Eye Movement Research
surprisal
parsing costs
potsdam sentence corpus
parsing difficulty
author_facet Marisa Ferrara Boston
John Hale
Reinhold Kliegl
Umesh Patil
Shravan Vasishth
author_sort Marisa Ferrara Boston
title Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
title_short Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
title_full Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
title_fullStr Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
title_full_unstemmed Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
title_sort parsing costs as predictors of reading difficulty: an evaluation using the potsdam sentence corpus
publisher Bern Open Publishing
series Journal of Eye Movement Research
issn 1995-8692
publishDate 2008-09-01
description The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect of surprisal while taking into account unigram frequency and bigram frequency (transitional probability), word length, and empirically-derived word predictability; the socalled “early” and “late” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically non-significant effect on empirically-derived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of post-syntactic events may be difficult to uphold.
topic surprisal
parsing costs
potsdam sentence corpus
parsing difficulty
url https://bop.unibe.ch/JEMR/article/view/2255
work_keys_str_mv AT marisaferraraboston parsingcostsaspredictorsofreadingdifficultyanevaluationusingthepotsdamsentencecorpus
AT johnhale parsingcostsaspredictorsofreadingdifficultyanevaluationusingthepotsdamsentencecorpus
AT reinholdkliegl parsingcostsaspredictorsofreadingdifficultyanevaluationusingthepotsdamsentencecorpus
AT umeshpatil parsingcostsaspredictorsofreadingdifficultyanevaluationusingthepotsdamsentencecorpus
AT shravanvasishth parsingcostsaspredictorsofreadingdifficultyanevaluationusingthepotsdamsentencecorpus
_version_ 1721423745032126464