Natural language techniques for error correction

Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes...

Full description

Bibliographic Details
Main Author: Bowden, T. G.
Published: University of Cambridge 1997
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596815
id ndltd-bl.uk-oai-ethos.bl.uk-596815
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5968152015-03-20T06:09:35ZNatural language techniques for error correctionBowden, T. G.1997Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes from cooperative error handling: detecting/correcting errors just after user entry, as the user is entering further text. Short context, or shallow, processing is also interesting because it is potentially cheaper and faster than a full-scale parse and because sentential constraints become less reliable when the 'sentence' is ill-formed. There has been no previous report on the effectiveness of local syntactic constraints on general (English) ill-formedness. Additionally all error processing programmes, other than some working in very restricted domains, have been post-processors rather than cooperative. Being post-processors, previous programs have been concerned with errors left undetected, after some degree of proofreading. Cooperative processing is also aimed at the errors people spend time backtracking to catch. In the absence of existent suitable data, a corpus of keystrokes made by subjects entering a piece of text was collated; errors were classified as caught or uncaught and various interesting analyses emerged. For context-less processing, a method based on morphological error rules and another on binary positional trigrams were devised and compared. Then to incorporate context, local syntactic constraints based on tag information were implemented, using bigram and triggram co-occurrence checks with a Markov tagging procedure. The tag-based constraints were compared with a partial parsing method. These error handlers were evaluated on data from the Keystroke Corpus and on other data manufactured and collected. The morphological error rules and tag-based checks using very short context were the most promising. As far as current comparison allows, there being a scarcity of reported results in this area, the short context techniques implemented here compared well against full-parsing error handlers. Ideas outlined for future work include a method for further identifying detected word scope errors and a practical, usable cooperative corrector based on an extension of an existing commercial application.006.3University of Cambridgehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596815Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.3
spellingShingle 006.3
Bowden, T. G.
Natural language techniques for error correction
description Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes from cooperative error handling: detecting/correcting errors just after user entry, as the user is entering further text. Short context, or shallow, processing is also interesting because it is potentially cheaper and faster than a full-scale parse and because sentential constraints become less reliable when the 'sentence' is ill-formed. There has been no previous report on the effectiveness of local syntactic constraints on general (English) ill-formedness. Additionally all error processing programmes, other than some working in very restricted domains, have been post-processors rather than cooperative. Being post-processors, previous programs have been concerned with errors left undetected, after some degree of proofreading. Cooperative processing is also aimed at the errors people spend time backtracking to catch. In the absence of existent suitable data, a corpus of keystrokes made by subjects entering a piece of text was collated; errors were classified as caught or uncaught and various interesting analyses emerged. For context-less processing, a method based on morphological error rules and another on binary positional trigrams were devised and compared. Then to incorporate context, local syntactic constraints based on tag information were implemented, using bigram and triggram co-occurrence checks with a Markov tagging procedure. The tag-based constraints were compared with a partial parsing method. These error handlers were evaluated on data from the Keystroke Corpus and on other data manufactured and collected. The morphological error rules and tag-based checks using very short context were the most promising. As far as current comparison allows, there being a scarcity of reported results in this area, the short context techniques implemented here compared well against full-parsing error handlers. Ideas outlined for future work include a method for further identifying detected word scope errors and a practical, usable cooperative corrector based on an extension of an existing commercial application.
author Bowden, T. G.
author_facet Bowden, T. G.
author_sort Bowden, T. G.
title Natural language techniques for error correction
title_short Natural language techniques for error correction
title_full Natural language techniques for error correction
title_fullStr Natural language techniques for error correction
title_full_unstemmed Natural language techniques for error correction
title_sort natural language techniques for error correction
publisher University of Cambridge
publishDate 1997
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596815
work_keys_str_mv AT bowdentg naturallanguagetechniquesforerrorcorrection
_version_ 1716796569199575040