Development and Implementation of Chinese Syntax Parser

碩士 === 國立高雄應用科技大學 === 資訊工程系 === 106 === Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process...

Full description

Bibliographic Details
Main Authors: WU, ZHEN-HUI, 吳振輝
Other Authors: CHANG, TAO-HSING
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/8uq85t
id ndltd-TW-105KUAS0392003
record_format oai_dc
spelling ndltd-TW-105KUAS03920032019-05-16T00:44:37Z http://ndltd.ncl.edu.tw/handle/8uq85t Development and Implementation of Chinese Syntax Parser 中文句法剖析器之設計與實作 WU, ZHEN-HUI 吳振輝 碩士 國立高雄應用科技大學 資訊工程系 106 Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process natural languages in a more advanced way. Therefore, sentence parsing is an indispensable technique in NLP. The main purpose of this study is to build a Chinese syntax parser. We extracted information required for establishing syntactic rules from 61,087 Chinese parse trees in the Sinica Treebank, and this information was then used to establish the syntactic rule database for later syntax parsing processes. After establishing the syntactic rule database, all syntactic rules in the database were modified to be in conformity with the Chomsky normal form via virtual substitution. To achieve better parsing performances, the Cocke–Younger–Kasami algorithm which is based on the idea of dynamic programming was used to reduce the time complexity involved in the parsing process. After a sentence is parsed, various syntactic structures that are ambiguous but all conform to this sentence will be obtained. Thus, it is difficult to select the most reasonable syntactic structure from these syntactic ambiguities. To solve this problem, each syntactic rule in the database is allowed to have a probability value by using the probabilistic context-free grammar method. This makes it possible to evaluate the quality of each syntactic ambiguity by its probability value, and then choose the most reasonable syntactic structure for the sentence. In conclusion, the experiment results can achieve 89% consistency compared with the Sinica Treebank. Therefore, the system is an automated syntax parser that can act as a critical and effective tool for other NLP-related technologies. CHANG, TAO-HSING 張道行 2018 學位論文 ; thesis 54 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立高雄應用科技大學 === 資訊工程系 === 106 === Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process natural languages in a more advanced way. Therefore, sentence parsing is an indispensable technique in NLP. The main purpose of this study is to build a Chinese syntax parser. We extracted information required for establishing syntactic rules from 61,087 Chinese parse trees in the Sinica Treebank, and this information was then used to establish the syntactic rule database for later syntax parsing processes. After establishing the syntactic rule database, all syntactic rules in the database were modified to be in conformity with the Chomsky normal form via virtual substitution. To achieve better parsing performances, the Cocke–Younger–Kasami algorithm which is based on the idea of dynamic programming was used to reduce the time complexity involved in the parsing process. After a sentence is parsed, various syntactic structures that are ambiguous but all conform to this sentence will be obtained. Thus, it is difficult to select the most reasonable syntactic structure from these syntactic ambiguities. To solve this problem, each syntactic rule in the database is allowed to have a probability value by using the probabilistic context-free grammar method. This makes it possible to evaluate the quality of each syntactic ambiguity by its probability value, and then choose the most reasonable syntactic structure for the sentence. In conclusion, the experiment results can achieve 89% consistency compared with the Sinica Treebank. Therefore, the system is an automated syntax parser that can act as a critical and effective tool for other NLP-related technologies.
author2 CHANG, TAO-HSING
author_facet CHANG, TAO-HSING
WU, ZHEN-HUI
吳振輝
author WU, ZHEN-HUI
吳振輝
spellingShingle WU, ZHEN-HUI
吳振輝
Development and Implementation of Chinese Syntax Parser
author_sort WU, ZHEN-HUI
title Development and Implementation of Chinese Syntax Parser
title_short Development and Implementation of Chinese Syntax Parser
title_full Development and Implementation of Chinese Syntax Parser
title_fullStr Development and Implementation of Chinese Syntax Parser
title_full_unstemmed Development and Implementation of Chinese Syntax Parser
title_sort development and implementation of chinese syntax parser
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/8uq85t
work_keys_str_mv AT wuzhenhui developmentandimplementationofchinesesyntaxparser
AT wúzhènhuī developmentandimplementationofchinesesyntaxparser
AT wuzhenhui zhōngwénjùfǎpōuxīqìzhīshèjìyǔshízuò
AT wúzhènhuī zhōngwénjùfǎpōuxīqìzhīshèjìyǔshízuò
_version_ 1719169856510623744