Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling

碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Topic detection is part of the Topic Detection and Tracking field, which seeks to develop technologies that search, organize, and structure news-oriented textual materials from various broadcast news media. We are interested in detecting “hot” topics that are fr...

Full description

Bibliographic Details
Main Authors: Kuan-Yu Chen, 陳冠宇
Other Authors: 曹承礎
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/19006771268255057281
id ndltd-TW-093NTU05396056
record_format oai_dc
spelling ndltd-TW-093NTU053960562015-10-13T11:12:50Z http://ndltd.ncl.edu.tw/handle/19006771268255057281 Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling 以時間分析與多維度語句呈現為基礎之熱門話題萃取 Kuan-Yu Chen 陳冠宇 碩士 國立臺灣大學 資訊管理學研究所 93 Topic detection is part of the Topic Detection and Tracking field, which seeks to develop technologies that search, organize, and structure news-oriented textual materials from various broadcast news media. We are interested in detecting “hot” topics that are frequently discussed by people in a given period of time. A prior work on hot topic extraction that designed an innovative term-weighting scheme called TF*PDF, which extracts “hot” terms that can describe hot topics. One of the problems that happens in the process of extracting hot topics using TF*PDF is the unreliability of results when the weight is determined solely on term frequency and document frequency. Another problem is that using one single vector misrepresents the meaning of a sentence. We propose a hot topic extraction system that aims to solve the two problems mentioned above. First, we extract the hot terms by capturing their variations of the time distribution within a timeline. In other words, tracking the life cycles of the terms can help us differentiate which term is a real hot term that describes a hot topic. Second, we use multi-dimensional sentence vectors to feature the information of a sentence. Finally we group the sentences of news report into clusters, which represent hot topics. Clustering the sentences by the multi-dimensional sentence vectors not only improves the quality of each cluster, but also extracts most of the actual hot topics over a period of time. 曹承礎 2005 學位論文 ; thesis 55 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Topic detection is part of the Topic Detection and Tracking field, which seeks to develop technologies that search, organize, and structure news-oriented textual materials from various broadcast news media. We are interested in detecting “hot” topics that are frequently discussed by people in a given period of time. A prior work on hot topic extraction that designed an innovative term-weighting scheme called TF*PDF, which extracts “hot” terms that can describe hot topics. One of the problems that happens in the process of extracting hot topics using TF*PDF is the unreliability of results when the weight is determined solely on term frequency and document frequency. Another problem is that using one single vector misrepresents the meaning of a sentence. We propose a hot topic extraction system that aims to solve the two problems mentioned above. First, we extract the hot terms by capturing their variations of the time distribution within a timeline. In other words, tracking the life cycles of the terms can help us differentiate which term is a real hot term that describes a hot topic. Second, we use multi-dimensional sentence vectors to feature the information of a sentence. Finally we group the sentences of news report into clusters, which represent hot topics. Clustering the sentences by the multi-dimensional sentence vectors not only improves the quality of each cluster, but also extracts most of the actual hot topics over a period of time.
author2 曹承礎
author_facet 曹承礎
Kuan-Yu Chen
陳冠宇
author Kuan-Yu Chen
陳冠宇
spellingShingle Kuan-Yu Chen
陳冠宇
Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling
author_sort Kuan-Yu Chen
title Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling
title_short Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling
title_full Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling
title_fullStr Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling
title_full_unstemmed Hot Topic Extraction with Timeline Analysis and Multidimensional Sentence Modeling
title_sort hot topic extraction with timeline analysis and multidimensional sentence modeling
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/19006771268255057281
work_keys_str_mv AT kuanyuchen hottopicextractionwithtimelineanalysisandmultidimensionalsentencemodeling
AT chénguānyǔ hottopicextractionwithtimelineanalysisandmultidimensionalsentencemodeling
AT kuanyuchen yǐshíjiānfēnxīyǔduōwéidùyǔjùchéngxiànwèijīchǔzhīrèménhuàtícuìqǔ
AT chénguānyǔ yǐshíjiānfēnxīyǔduōwéidùyǔjùchéngxiànwèijīchǔzhīrèménhuàtícuìqǔ
_version_ 1716840085949775872