Development and Implementation of Chinese Syntax Parser

碩士 === 國立高雄應用科技大學 === 資訊工程系 === 106 === Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process...

Full description

Bibliographic Details
Main Authors:	WU, ZHEN-HUI, 吳振輝
Other Authors:	CHANG, TAO-HSING
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/8uq85t

id	ndltd-TW-105KUAS0392003
record_format	oai_dc
spelling	ndltd-TW-105KUAS03920032019-05-16T00:44:37Z http://ndltd.ncl.edu.tw/handle/8uq85t Development and Implementation of Chinese Syntax Parser 中文句法剖析器之設計與實作 WU, ZHEN-HUI 吳振輝碩士國立高雄應用科技大學資訊工程系 106 Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process natural languages in a more advanced way. Therefore, sentence parsing is an indispensable technique in NLP. The main purpose of this study is to build a Chinese syntax parser. We extracted information required for establishing syntactic rules from 61,087 Chinese parse trees in the Sinica Treebank, and this information was then used to establish the syntactic rule database for later syntax parsing processes. After establishing the syntactic rule database, all syntactic rules in the database were modified to be in conformity with the Chomsky normal form via virtual substitution. To achieve better parsing performances, the Cocke–Younger–Kasami algorithm which is based on the idea of dynamic programming was used to reduce the time complexity involved in the parsing process. After a sentence is parsed, various syntactic structures that are ambiguous but all conform to this sentence will be obtained. Thus, it is difficult to select the most reasonable syntactic structure from these syntactic ambiguities. To solve this problem, each syntactic rule in the database is allowed to have a probability value by using the probabilistic context-free grammar method. This makes it possible to evaluate the quality of each syntactic ambiguity by its probability value, and then choose the most reasonable syntactic structure for the sentence. In conclusion, the experiment results can achieve 89% consistency compared with the Sinica Treebank. Therefore, the system is an automated syntax parser that can act as a critical and effective tool for other NLP-related technologies. CHANG, TAO-HSING 張道行 2018 學位論文 ; thesis 54 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立高雄應用科技大學 === 資訊工程系 === 106 === Natural-language processing (NLP) is a popular research topic. The syntax parser in this field is an important basic tool that enables the computer to obtain the structure of the sentence. Once the sentence structure is obtained, the computer can then process natural languages in a more advanced way. Therefore, sentence parsing is an indispensable technique in NLP. The main purpose of this study is to build a Chinese syntax parser. We extracted information required for establishing syntactic rules from 61,087 Chinese parse trees in the Sinica Treebank, and this information was then used to establish the syntactic rule database for later syntax parsing processes. After establishing the syntactic rule database, all syntactic rules in the database were modified to be in conformity with the Chomsky normal form via virtual substitution. To achieve better parsing performances, the Cocke–Younger–Kasami algorithm which is based on the idea of dynamic programming was used to reduce the time complexity involved in the parsing process. After a sentence is parsed, various syntactic structures that are ambiguous but all conform to this sentence will be obtained. Thus, it is difficult to select the most reasonable syntactic structure from these syntactic ambiguities. To solve this problem, each syntactic rule in the database is allowed to have a probability value by using the probabilistic context-free grammar method. This makes it possible to evaluate the quality of each syntactic ambiguity by its probability value, and then choose the most reasonable syntactic structure for the sentence. In conclusion, the experiment results can achieve 89% consistency compared with the Sinica Treebank. Therefore, the system is an automated syntax parser that can act as a critical and effective tool for other NLP-related technologies.
author2	CHANG, TAO-HSING
author_facet	CHANG, TAO-HSING WU, ZHEN-HUI 吳振輝
author	WU, ZHEN-HUI 吳振輝
spellingShingle	WU, ZHEN-HUI 吳振輝 Development and Implementation of Chinese Syntax Parser
author_sort	WU, ZHEN-HUI
title	Development and Implementation of Chinese Syntax Parser
title_short	Development and Implementation of Chinese Syntax Parser
title_full	Development and Implementation of Chinese Syntax Parser
title_fullStr	Development and Implementation of Chinese Syntax Parser
title_full_unstemmed	Development and Implementation of Chinese Syntax Parser
title_sort	development and implementation of chinese syntax parser
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/8uq85t
work_keys_str_mv	AT wuzhenhui developmentandimplementationofchinesesyntaxparser AT wúzhènhuī developmentandimplementationofchinesesyntaxparser AT wuzhenhui zhōngwénjùfǎpōuxīqìzhīshèjìyǔshízuò AT wúzhènhuī zhōngwénjùfǎpōuxīqìzhīshèjìyǔshízuò
_version_	1719169856510623744

Development and Implementation of Chinese Syntax Parser

Similar Items