Inverted File Compression in Scaleable Information Systems By Posting Clustering

碩士 === 國立中正大學 === 資訊工程研究所 === 88 === Information retrieval systems become fat and slow as the amount of information grows exponentially. The major data structure of the information retrieval system is the index that records the positions of each term in the documents. The most used index technique i...

Full description

Bibliographic Details
Main Authors: Chung Hung Lai, 賴宗鴻
Other Authors: Tien-Fu Chan
Format: Others
Language:en_US
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/69005222828234331880
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 88 === Information retrieval systems become fat and slow as the amount of information grows exponentially. The major data structure of the information retrieval system is the index that records the positions of each term in the documents. The most used index technique is the inverted file, which records the documents contain the term for each term. When the number of term and document increased, the size of the index is also increased. To reduce the size of the inverted file without sacrificing the query efficiency, we propose the posting clustering schemes. We collect posting sequences into clusters and encode each cluster with binary code or bit vector according to the emphasis of the information retrieval system. The binary encoding generates better compression ratio and the bit vector encoding leads to better query speed. The disorderliness of the documents in the information retrieval system often spoils the performance of the compress scheme. Therefore we proposed the document reordering methods to reorganize the documents to help the posting clustering compress schemes. Because most queries in information retrieval systems concern with Boolean operations, which is the major application of BDD, we develop the BDD inverted list to speedup the queries. Experiment results show that the compression ratios of the inverted files have been improved with cluster schemes and the BDD scheme.