Summary: | 博士 === 國立臺灣大學 === 電機工程學研究所 === 95 === The World Wide Web (WWW) has become a major information source for people from all walks of life. Although the WWW facilitates information distribution, the ever-increasing volume of Internet documents has made information discovery from the Internet a time consuming task. To manage the massive information of the Internet efficiently, there is a critical need for event detect and summarization methods from text streams.
In this dissertation, we provide two adaptive methods to detect sequential events from text streams. We first propose an aging theory to model the life cycle of events. Then, we provide an event detection framework called LIPED which utilizes HMM-based life profiles to predict the activeness status of events for adaptive threshold adjustments. To help user comprehend the development of news topics easily, we also provide a unified mechanism to construct a topic evolution graph and summary from topic documents. The experiment results based on the official TDT4 corpus show that the proposed event detection methods improve the performance of existing well-known event detection approaches substantially, and the composed topic summaries and evolution graphs are highly representative.
|