設計與實作一個針對遊戲論壇的中文文章整合系統

現今網路發達便利，人們資訊交換的方式更多元，取得資訊的方式，不再僅是透過新聞，透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增，就算透過搜尋引擎，使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例，期望可以為使用者提供一個全面且精要的遊戲角色描述，讓使用者至少對該角色有大概的認知。本研究參考網路論壇探勘及新聞文件摘要系統，設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性，實驗如何從論壇挖掘出潛藏的資訊，並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段：1. 資料前處理...

Full description

Bibliographic Details
Main Authors:	黃重鈞, Huang, Chung Chun
Language:	中文
Published:	國立政治大學
Subjects:	中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering
Online Access:	http://thesis.lib.nccu.edu.tw/cgi-bin/cdrfb3/gsweb.cgi?o=dstdcdr&i=sid=%22G0101753024%22.

id	ndltd-CHENGCHI-G0101753024
record_format	oai_dc
spelling	ndltd-CHENGCHI-G01017530242016-07-14T03:30:35Z 設計與實作一個針對遊戲論壇的中文文章整合系統 Design and Implementation of a Chinese Document Integration System for Game Forums 黃重鈞 Huang, Chung Chun 中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering 現今網路發達便利，人們資訊交換的方式更多元，取得資訊的方式，不再僅是透過新聞，透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增，就算透過搜尋引擎，使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例，期望可以為使用者提供一個全面且精要的遊戲角色描述，讓使用者至少對該角色有大概的認知。本研究參考網路論壇探勘及新聞文件摘要系統，設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性，實驗如何從論壇挖掘出潛藏的資訊，並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段：1. 資料前處理：論壇文章與新聞文章不同，很難直接將名詞、動詞作為關鍵字，因此使用TF-IDF篩選出論壇文章中有代表性的詞彙，作為句子的向量空間維度。2. 分群：使用K-Means分群法分辨哪些句子是比較相似的，並將相似的句子分在同一群。 3. 句子挑選：根據句子的分群結果，依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。我們發現實驗分析過程中可以看到一些有用的相關資訊，在論文的最後提出可能的改善方法，期望未來可以開發更好的論壇文章分類方式。 With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum. We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF. We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future. 國立政治大學 http://thesis.lib.nccu.edu.tw/cgi-bin/cdrfb3/gsweb.cgi?o=dstdcdr&i=sid=%22G0101753024%22. text 中文 Copyright © nccu library on behalf of the copyright holders
collection	NDLTD
language	中文
sources	NDLTD
topic	中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering
spellingShingle	中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering 黃重鈞 Huang, Chung Chun 設計與實作一個針對遊戲論壇的中文文章整合系統
description	現今網路發達便利，人們資訊交換的方式更多元，取得資訊的方式，不再僅是透過新聞，透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增，就算透過搜尋引擎，使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例，期望可以為使用者提供一個全面且精要的遊戲角色描述，讓使用者至少對該角色有大概的認知。本研究參考網路論壇探勘及新聞文件摘要系統，設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性，實驗如何從論壇挖掘出潛藏的資訊，並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段：1. 資料前處理：論壇文章與新聞文章不同，很難直接將名詞、動詞作為關鍵字，因此使用TF-IDF篩選出論壇文章中有代表性的詞彙，作為句子的向量空間維度。2. 分群：使用K-Means分群法分辨哪些句子是比較相似的，並將相似的句子分在同一群。 3. 句子挑選：根據句子的分群結果，依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。我們發現實驗分析過程中可以看到一些有用的相關資訊，在論文的最後提出可能的改善方法，期望未來可以開發更好的論壇文章分類方式。 === With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum. We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF. We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future.
author	黃重鈞 Huang, Chung Chun
author_facet	黃重鈞 Huang, Chung Chun
author_sort	黃重鈞
title	設計與實作一個針對遊戲論壇的中文文章整合系統
title_short	設計與實作一個針對遊戲論壇的中文文章整合系統
title_full	設計與實作一個針對遊戲論壇的中文文章整合系統
title_fullStr	設計與實作一個針對遊戲論壇的中文文章整合系統
title_full_unstemmed	設計與實作一個針對遊戲論壇的中文文章整合系統
title_sort	設計與實作一個針對遊戲論壇的中文文章整合系統
publisher	國立政治大學
url	http://thesis.lib.nccu.edu.tw/cgi-bin/cdrfb3/gsweb.cgi?o=dstdcdr&i=sid=%22G0101753024%22.
work_keys_str_mv	AT huángzhòngjūn shèjìyǔshízuòyīgèzhēnduìyóuxìlùntándezhōngwénwénzhāngzhěnghéxìtǒng AT huangchungchun shèjìyǔshízuòyīgèzhēnduìyóuxìlùntándezhōngwénwénzhāngzhěnghéxìtǒng AT huángzhòngjūn designandimplementationofachinesedocumentintegrationsystemforgameforums AT huangchungchun designandimplementationofachinesedocumentintegrationsystemforgameforums
_version_	1718346859112562688

設計與實作一個針對遊戲論壇的中文文章整合系統

Similar Items