A Study on Automatic Recognition on Exact Synonyms between Traditional and Simplified Chinese

碩士 === 國立政治大學 === 資訊科學學系 === 97 === Traditional Chinese and Simplied Chinese are not only different in the typeface and in the computer code, but also in the partial usage of vocabularies. These vocabularies which have different usage but have the same significance are called synonyms. These synonym...

Full description

Bibliographic Details
Main Author: 黃群弼
Other Authors: Liu,Jyi Shane
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/23544608046647586329
Description
Summary:碩士 === 國立政治大學 === 資訊科學學系 === 97 === Traditional Chinese and Simplied Chinese are not only different in the typeface and in the computer code, but also in the partial usage of vocabularies. These vocabularies which have different usage but have the same significance are called synonyms. These synonyms will cause some obstacles and misunderstanding in meaning when two parties have cultural exchange, such as during conversation, documents and books translation or softwares system transformation. What we do to solve the problem now is picked them out by manpower, but that will waste a lot of time and strength and easily make errors. If we can use scientific way to let the computer distinguish automatically the synonyms between Traditional Chinese and Simplied Chinese, we will be able to solve such misunderstanding by the hints of the distinguished synonyms. According to the structure of experiment, to let the computer distinguish automatically the synonyms between Traditional Chinese and Simplied Chinese, we have to establish a Traditional Chinese and Simplied Chinese computer category and a general category first as the basis of identification. We should build up the research structure and the method, which divided into two stages and three methods. The first stage uses the first method to use N-gram to distinguish the synonyms and then review if this single method can identify the synonyms effectively. The second stage uses the second method PMI-IR & LC-IR and the third method Context Vector and review if the second stage can raise the synonyms’ ability of identification. According to this research purpose, the computer to study on automatic exact recognition synonyms between traditional and simplified Chinese, so has proposed the new structure of distinguishing, N-gram automatic exact recognition synonym tentatively, and PMI-IR & LC-IR and Context Vector method can improve Precision about 0~20%. This conclusion is a corpus base of using different languages, using N-gram can be exact recognition synonyms, PMI-IR & LC-IR and Context Vector method, can improve single method ability.