Automatic Chinese Character Error Detecting System Based on N-gram Language Model and Pragmatics Knowledge Base

碩士 === 朝陽科技大學 === 資訊工程系碩士班 === 97 === Essay error detection is an important function for computer-aided essay composition. Systems that can detect the spelling errors and usage errors are very helpful for students. Previous systems based on confusion sets of each Chinese character tended to give fal...

Full description

Bibliographic Details
Main Authors: Ta-Hung Hung, 洪大弘
Other Authors: Shih-Hung Wu
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/52207006755338310956
Description
Summary:碩士 === 朝陽科技大學 === 資訊工程系碩士班 === 97 === Essay error detection is an important function for computer-aided essay composition. Systems that can detect the spelling errors and usage errors are very helpful for students. Previous systems based on confusion sets of each Chinese character tended to give false alarms and did not explain the errors. To overcome these drawbacks, we implement an error detection system of Chinese essay, based on statistic methods and knowledge base. It can label the errors and give suggestions. Previous works focus on all possible errors from words with similar shape or pronunciation. In addition to the common error patterns, we collect corpus of various correct usage such as idiom, maxim, and slang, which provides context of potential errors. Our system make decision based on n-gram language model, once a word is labeled as an error, the system will give explanation base on the correct context. Thus, our system can offer students information to improve their essay. Traditionally, there are two difficulties on the application of language model. One is data sparseness, another is data adoptability. To deal with the drawback of N-gram language model on the data sparseness problem. We adopt several smoothing methods in our system. To overcome the adoptability, our system combines two language models to fit the usage of students. With a large knowledge base that contains thousands of common error patterns, our system can better identify error candidates. In the experiments, simulate data and real essay corpus are used. We will report the recall and precision of our system, give error analysis, and find the possible benefit of our system. We believe the system can help students and teachers not only in class but also for distance learning via Internet.