Natural language techniques for error correction
Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes...
Main Author: | |
---|---|
Published: |
University of Cambridge
1997
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596815 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-596815 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-5968152015-03-20T06:09:35ZNatural language techniques for error correctionBowden, T. G.1997Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes from cooperative error handling: detecting/correcting errors just after user entry, as the user is entering further text. Short context, or shallow, processing is also interesting because it is potentially cheaper and faster than a full-scale parse and because sentential constraints become less reliable when the 'sentence' is ill-formed. There has been no previous report on the effectiveness of local syntactic constraints on general (English) ill-formedness. Additionally all error processing programmes, other than some working in very restricted domains, have been post-processors rather than cooperative. Being post-processors, previous programs have been concerned with errors left undetected, after some degree of proofreading. Cooperative processing is also aimed at the errors people spend time backtracking to catch. In the absence of existent suitable data, a corpus of keystrokes made by subjects entering a piece of text was collated; errors were classified as caught or uncaught and various interesting analyses emerged. For context-less processing, a method based on morphological error rules and another on binary positional trigrams were devised and compared. Then to incorporate context, local syntactic constraints based on tag information were implemented, using bigram and triggram co-occurrence checks with a Markov tagging procedure. The tag-based constraints were compared with a partial parsing method. These error handlers were evaluated on data from the Keystroke Corpus and on other data manufactured and collected. The morphological error rules and tag-based checks using very short context were the most promising. As far as current comparison allows, there being a scarcity of reported results in this area, the short context techniques implemented here compared well against full-parsing error handlers. Ideas outlined for future work include a method for further identifying detected word scope errors and a practical, usable cooperative corrector based on an extension of an existing commercial application.006.3University of Cambridgehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596815Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
006.3 |
spellingShingle |
006.3 Bowden, T. G. Natural language techniques for error correction |
description |
Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes from cooperative error handling: detecting/correcting errors just after user entry, as the user is entering further text. Short context, or shallow, processing is also interesting because it is potentially cheaper and faster than a full-scale parse and because sentential constraints become less reliable when the 'sentence' is ill-formed. There has been no previous report on the effectiveness of local syntactic constraints on general (English) ill-formedness. Additionally all error processing programmes, other than some working in very restricted domains, have been post-processors rather than cooperative. Being post-processors, previous programs have been concerned with errors left undetected, after some degree of proofreading. Cooperative processing is also aimed at the errors people spend time backtracking to catch. In the absence of existent suitable data, a corpus of keystrokes made by subjects entering a piece of text was collated; errors were classified as caught or uncaught and various interesting analyses emerged. For context-less processing, a method based on morphological error rules and another on binary positional trigrams were devised and compared. Then to incorporate context, local syntactic constraints based on tag information were implemented, using bigram and triggram co-occurrence checks with a Markov tagging procedure. The tag-based constraints were compared with a partial parsing method. These error handlers were evaluated on data from the Keystroke Corpus and on other data manufactured and collected. The morphological error rules and tag-based checks using very short context were the most promising. As far as current comparison allows, there being a scarcity of reported results in this area, the short context techniques implemented here compared well against full-parsing error handlers. Ideas outlined for future work include a method for further identifying detected word scope errors and a practical, usable cooperative corrector based on an extension of an existing commercial application. |
author |
Bowden, T. G. |
author_facet |
Bowden, T. G. |
author_sort |
Bowden, T. G. |
title |
Natural language techniques for error correction |
title_short |
Natural language techniques for error correction |
title_full |
Natural language techniques for error correction |
title_fullStr |
Natural language techniques for error correction |
title_full_unstemmed |
Natural language techniques for error correction |
title_sort |
natural language techniques for error correction |
publisher |
University of Cambridge |
publishDate |
1997 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596815 |
work_keys_str_mv |
AT bowdentg naturallanguagetechniquesforerrorcorrection |
_version_ |
1716796569199575040 |