Identifying Common Erroneous Patterns for Auto Editing

碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance betwe...

Full description

Bibliographic Details
Main Authors: An-Ta Huang, 黃安達
Other Authors: Shou-De Lin
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/39561598991412577425
Description
Summary:碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance between sentences to identify the key parts of the rules and then use the editing corpus to filter, condense and refine the rules. We produce the rule candidates of such form, A => B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we focus on the generality of the rules to make the rules more general. Finally, we also employ the property of POS (Part of Speech) to make the rules general and can be applied to different sentences but similar in its POS form. Our framework is language independent, therefore can be applied to other languages easily. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we create an online auto-editing system for demo on http://mslab.csie.ntu.edu.tw/~kw/new_demo.html.