A Study on Multiple Document Summarization Systems

博士 === 國立臺灣大學 === 資訊工程學研究所 === 94 === In order to provide a generic summary to help on-line readers to absorb news information from multiple sources, in this dissertation we study the related issues on the multi-document summarization, e.g., event clustering, sentence selection, redundancy avoidance...

Full description

Bibliographic Details
Main Authors: June-Jei Kuo, 郭俊桔
Other Authors: 陳信希
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/84838785413103052977
Description
Summary:博士 === 國立臺灣大學 === 資訊工程學研究所 === 94 === In order to provide a generic summary to help on-line readers to absorb news information from multiple sources, in this dissertation we study the related issues on the multi-document summarization, e.g., event clustering, sentence selection, redundancy avoidance, sentence ordering and summary evaluation, and focus on two major modules: event clustering and summary generation. Besides using the conventional features, e.g., lexical information or part-of-speech, term frequency, document frequency and paragraph dispersion of a word in a document are used to propose informative words, which can be used to represent the corresponding document. In the event clustering module, to further understand a document we introduce the semantic features, such as event words and co-reference chains. The controlled vocabulary mining from co-reference chains is also proposed to solve the cross document name entity unification issue. Meanwhile, we propose a novel dynamic threshold model to enhance the performance of event clustering. On the other hand, in the summary generation module, we propose a temporal tagger to deal with the temporal resolution and provide sentence dates for sentence ordering. We also introduce the latent semantic analysis (LSA) to tackle the sentence selection issue. On the one hand, to tackle the summary length issue, the sentence reduction algorithm using both event constituent words and informative words is also proposed. Finally, the experimental results on both content and readability for generated multi-document summarization are promising. On the other hand, to investigate the performance of proposed semantic features, the headline generation and multi-lingual multi-document summarization are also studied. Besides, we tackle the automatic evaluation issue on summary evaluation by introducing question answering (QA). Promising results are obtained as well.