Development and Implementation of Chinese Syntax Parser
碩士 === 國立高雄應用科技大學 === 資訊工程系 === 106 === Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/8uq85t |
id |
ndltd-TW-105KUAS0392003 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105KUAS03920032019-05-16T00:44:37Z http://ndltd.ncl.edu.tw/handle/8uq85t Development and Implementation of Chinese Syntax Parser 中文句法剖析器之設計與實作 WU, ZHEN-HUI 吳振輝 碩士 國立高雄應用科技大學 資訊工程系 106 Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process natural languages in a more advanced way. Therefore, sentence parsing is an indispensable technique in NLP. The main purpose of this study is to build a Chinese syntax parser. We extracted information required for establishing syntactic rules from 61,087 Chinese parse trees in the Sinica Treebank, and this information was then used to establish the syntactic rule database for later syntax parsing processes. After establishing the syntactic rule database, all syntactic rules in the database were modified to be in conformity with the Chomsky normal form via virtual substitution. To achieve better parsing performances, the Cocke–Younger–Kasami algorithm which is based on the idea of dynamic programming was used to reduce the time complexity involved in the parsing process. After a sentence is parsed, various syntactic structures that are ambiguous but all conform to this sentence will be obtained. Thus, it is difficult to select the most reasonable syntactic structure from these syntactic ambiguities. To solve this problem, each syntactic rule in the database is allowed to have a probability value by using the probabilistic context-free grammar method. This makes it possible to evaluate the quality of each syntactic ambiguity by its probability value, and then choose the most reasonable syntactic structure for the sentence. In conclusion, the experiment results can achieve 89% consistency compared with the Sinica Treebank. Therefore, the system is an automated syntax parser that can act as a critical and effective tool for other NLP-related technologies. CHANG, TAO-HSING 張道行 2018 學位論文 ; thesis 54 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立高雄應用科技大學 === 資訊工程系 === 106 === Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process natural languages in a more advanced way. Therefore, sentence parsing is an indispensable technique in NLP.
The main purpose of this study is to build a Chinese syntax parser. We extracted information required for establishing syntactic rules from 61,087 Chinese parse trees in the Sinica Treebank, and this information was then used to establish the syntactic rule database for later syntax parsing processes. After establishing the syntactic rule database, all syntactic rules in the database were modified to be in conformity with the Chomsky normal form via virtual substitution. To achieve better parsing performances, the Cocke–Younger–Kasami algorithm which is based on the idea of dynamic programming was used to reduce the time complexity involved in the parsing process.
After a sentence is parsed, various syntactic structures that are ambiguous but all conform to this sentence will be obtained. Thus, it is difficult to select the most reasonable syntactic structure from these syntactic ambiguities. To solve this problem, each syntactic rule in the database is allowed to have a probability value by using the probabilistic context-free grammar method. This makes it possible to evaluate the quality of each syntactic ambiguity by its probability value, and then choose the most reasonable syntactic structure for the sentence.
In conclusion, the experiment results can achieve 89% consistency compared with the Sinica Treebank. Therefore, the system is an automated syntax parser that can act as a critical and effective tool for other NLP-related technologies.
|
author2 |
CHANG, TAO-HSING |
author_facet |
CHANG, TAO-HSING WU, ZHEN-HUI 吳振輝 |
author |
WU, ZHEN-HUI 吳振輝 |
spellingShingle |
WU, ZHEN-HUI 吳振輝 Development and Implementation of Chinese Syntax Parser |
author_sort |
WU, ZHEN-HUI |
title |
Development and Implementation of Chinese Syntax Parser |
title_short |
Development and Implementation of Chinese Syntax Parser |
title_full |
Development and Implementation of Chinese Syntax Parser |
title_fullStr |
Development and Implementation of Chinese Syntax Parser |
title_full_unstemmed |
Development and Implementation of Chinese Syntax Parser |
title_sort |
development and implementation of chinese syntax parser |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/8uq85t |
work_keys_str_mv |
AT wuzhenhui developmentandimplementationofchinesesyntaxparser AT wúzhènhuī developmentandimplementationofchinesesyntaxparser AT wuzhenhui zhōngwénjùfǎpōuxīqìzhīshèjìyǔshízuò AT wúzhènhuī zhōngwénjùfǎpōuxīqìzhīshèjìyǔshízuò |
_version_ |
1719169856510623744 |