Quantification of textual comprehension difficulty with an information theory-based algorithm

Textual comprehension is often not adequately acquired despite intense didactic efforts. Textual comprehension quality is mostly evaluated using subjective criteria. Starting from the assumption that word usage statistics may be used to infer the probability of successful semantic representations, w...

Full description

Bibliographic Details
Main Authors: Costa, K.M (Author), da Silva Filho, M. (Author), Ribeiro, L.B (Author), Rodrigues, A.R (Author)
Format: Article
Language:English
Published: Palgrave Macmillan Ltd. 2019
Online Access:View Fulltext in Publisher
LEADER 02038nam a2200169Ia 4500
001 10.1057-s41599-019-0311-0
008 220511s2019 CNT 000 0 und d
020 |a 20551045 (ISSN) 
245 1 0 |a Quantification of textual comprehension difficulty with an information theory-based algorithm 
260 0 |b Palgrave Macmillan Ltd.  |c 2019 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1057/s41599-019-0311-0 
520 3 |a Textual comprehension is often not adequately acquired despite intense didactic efforts. Textual comprehension quality is mostly evaluated using subjective criteria. Starting from the assumption that word usage statistics may be used to infer the probability of successful semantic representations, we hypothesized that textual comprehension depended on words with high occurrence probability (high degree of familiarity), which is typically inversely proportional to their information entropy. We tested this hypothesis by quantifying word occurrences in a bank of words from Portuguese language academic theses and using information theory tools to infer degrees of textual familiarity. We found that the lower and upper bounds of the database were delimited by low-entropy words with the highest probabilities of causing incomprehension (i.e., nouns and adjectives) or facilitating semantic decoding (i.e., prepositions and conjunctions). We developed an openly available software suite called CalcuLetra for implementing these algorithms and tested it on publicly available denotative text samples (e.g., articles, essays, and abstracts). We propose that the quantitative model presented here may apply to other languages and could be a tool for supporting automated textual comprehension evaluations, and potentially assisting the development of teaching materials or the diagnosis of learning disorders. © 2019, The Author(s). 
700 1 |a Costa, K.M.  |e author 
700 1 |a da Silva Filho, M.  |e author 
700 1 |a Ribeiro, L.B.  |e author 
700 1 |a Rodrigues, A.R.  |e author 
773 |t Palgrave Communications