An enhanced writer language model for Chinese historical corpora

碩士 === 國立政治大學 === 資訊科學學系 === 105 === In recent years, the trend of digital collections has been developing day by day, and more and more precious Chinese historical corpora have been selected for preservation. The preservation of the corpora at the same time will face the loss or lack of the authors...

Full description

Bibliographic Details
Main Authors:	Liang, Shao Zhong, 梁韶中
Other Authors:	Tsai, Ming Feng
Format:	Others
Language:	zh-TW
Online Access:	http://ndltd.ncl.edu.tw/handle/c47ph3

id	ndltd-TW-105NCCU5394032
record_format	oai_dc
spelling	ndltd-TW-105NCCU53940322019-05-15T23:39:15Z http://ndltd.ncl.edu.tw/handle/c47ph3 An enhanced writer language model for Chinese historical corpora 適用於中文史料文本之作者語言模型分析方法研究 Liang, Shao Zhong 梁韶中碩士國立政治大學資訊科學學系 105 In recent years, the trend of digital collections has been developing day by day, and more and more precious Chinese historical corpora have been selected for preservation. The preservation of the corpora at the same time will face the loss or lack of the authors, thus affecting the integrity of the corpora. A method for analyzing the author of the Chinese historical text is mainly through the construction of the language model, for each potential author to train a specific language model, and with a different smoothing method can be avoided zero probability of words and the error is caused by the calculation. This paper mainly adopts the Interpolated Modified Kneser-Ney smoothing method, which will take into account the influence of higher order and lower order n-grams string frequency. So, Interpolated Modified Kneser-Ney smoothing is become a very popular way to construct a general choice of language models. The combination of all the articles of each potential author into a single language model will ignore many of the features, so this paper in addition to the value of the historical corpora, but also to add the metadata to integrate analysis, including the statistical information of the subject matter classification of the artificial mark, so that the constructed language model is more suitable for the measured text, increase the accuracy of the forecast results, add additional custom words to match the language of the proper nouns, in addition. But also on the basis of the general construction language model, the weight of the long word to join, to determine the length of the word on the relationship between the accuracy of prediction. Finally, recursive neural networks language models are also used to predict the authors and to make further comparisons with the traditional language model analysis. Tsai, Ming Feng 蔡銘峰學位論文 ; thesis 35 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立政治大學 === 資訊科學學系 === 105 === In recent years, the trend of digital collections has been developing day by day, and more and more precious Chinese historical corpora have been selected for preservation. The preservation of the corpora at the same time will face the loss or lack of the authors, thus affecting the integrity of the corpora. A method for analyzing the author of the Chinese historical text is mainly through the construction of the language model, for each potential author to train a specific language model, and with a different smoothing method can be avoided zero probability of words and the error is caused by the calculation. This paper mainly adopts the Interpolated Modified Kneser-Ney smoothing method, which will take into account the influence of higher order and lower order n-grams string frequency. So, Interpolated Modified Kneser-Ney smoothing is become a very popular way to construct a general choice of language models. The combination of all the articles of each potential author into a single language model will ignore many of the features, so this paper in addition to the value of the historical corpora, but also to add the metadata to integrate analysis, including the statistical information of the subject matter classification of the artificial mark, so that the constructed language model is more suitable for the measured text, increase the accuracy of the forecast results, add additional custom words to match the language of the proper nouns, in addition. But also on the basis of the general construction language model, the weight of the long word to join, to determine the length of the word on the relationship between the accuracy of prediction. Finally, recursive neural networks language models are also used to predict the authors and to make further comparisons with the traditional language model analysis.
author2	Tsai, Ming Feng
author_facet	Tsai, Ming Feng Liang, Shao Zhong 梁韶中
author	Liang, Shao Zhong 梁韶中
spellingShingle	Liang, Shao Zhong 梁韶中 An enhanced writer language model for Chinese historical corpora
author_sort	Liang, Shao Zhong
title	An enhanced writer language model for Chinese historical corpora
title_short	An enhanced writer language model for Chinese historical corpora
title_full	An enhanced writer language model for Chinese historical corpora
title_fullStr	An enhanced writer language model for Chinese historical corpora
title_full_unstemmed	An enhanced writer language model for Chinese historical corpora
title_sort	enhanced writer language model for chinese historical corpora
url	http://ndltd.ncl.edu.tw/handle/c47ph3
work_keys_str_mv	AT liangshaozhong anenhancedwriterlanguagemodelforchinesehistoricalcorpora AT liángsháozhōng anenhancedwriterlanguagemodelforchinesehistoricalcorpora AT liangshaozhong shìyòngyúzhōngwénshǐliàowénběnzhīzuòzhěyǔyánmóxíngfēnxīfāngfǎyánjiū AT liángsháozhōng shìyòngyúzhōngwénshǐliàowénběnzhīzuòzhěyǔyánmóxíngfēnxīfāngfǎyánjiū AT liangshaozhong enhancedwriterlanguagemodelforchinesehistoricalcorpora AT liángsháozhōng enhancedwriterlanguagemodelforchinesehistoricalcorpora
_version_	1719150193347133440

An enhanced writer language model for Chinese historical corpora

Similar Items