Inverted File Compression in Scaleable Information Systems By Posting Clustering

碩士 === 國立中正大學 === 資訊工程研究所 === 88 === Information retrieval systems become fat and slow as the amount of information grows exponentially. The major data structure of the information retrieval system is the index that records the positions of each term in the documents. The most used index technique i...

Full description

Bibliographic Details
Main Authors: Chung Hung Lai, 賴宗鴻
Other Authors: Tien-Fu Chan
Format: Others
Language:en_US
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/69005222828234331880
id ndltd-TW-088CCU00392037
record_format oai_dc
spelling ndltd-TW-088CCU003920372015-10-13T11:50:28Z http://ndltd.ncl.edu.tw/handle/69005222828234331880 Inverted File Compression in Scaleable Information Systems By Posting Clustering 利用群集文件指標壓縮可擴充式大型資訊系統中之索引檔案 Chung Hung Lai 賴宗鴻 碩士 國立中正大學 資訊工程研究所 88 Information retrieval systems become fat and slow as the amount of information grows exponentially. The major data structure of the information retrieval system is the index that records the positions of each term in the documents. The most used index technique is the inverted file, which records the documents contain the term for each term. When the number of term and document increased, the size of the index is also increased. To reduce the size of the inverted file without sacrificing the query efficiency, we propose the posting clustering schemes. We collect posting sequences into clusters and encode each cluster with binary code or bit vector according to the emphasis of the information retrieval system. The binary encoding generates better compression ratio and the bit vector encoding leads to better query speed. The disorderliness of the documents in the information retrieval system often spoils the performance of the compress scheme. Therefore we proposed the document reordering methods to reorganize the documents to help the posting clustering compress schemes. Because most queries in information retrieval systems concern with Boolean operations, which is the major application of BDD, we develop the BDD inverted list to speedup the queries. Experiment results show that the compression ratios of the inverted files have been improved with cluster schemes and the BDD scheme. Tien-Fu Chan 陳添福 2000 學位論文 ; thesis 70 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中正大學 === 資訊工程研究所 === 88 === Information retrieval systems become fat and slow as the amount of information grows exponentially. The major data structure of the information retrieval system is the index that records the positions of each term in the documents. The most used index technique is the inverted file, which records the documents contain the term for each term. When the number of term and document increased, the size of the index is also increased. To reduce the size of the inverted file without sacrificing the query efficiency, we propose the posting clustering schemes. We collect posting sequences into clusters and encode each cluster with binary code or bit vector according to the emphasis of the information retrieval system. The binary encoding generates better compression ratio and the bit vector encoding leads to better query speed. The disorderliness of the documents in the information retrieval system often spoils the performance of the compress scheme. Therefore we proposed the document reordering methods to reorganize the documents to help the posting clustering compress schemes. Because most queries in information retrieval systems concern with Boolean operations, which is the major application of BDD, we develop the BDD inverted list to speedup the queries. Experiment results show that the compression ratios of the inverted files have been improved with cluster schemes and the BDD scheme.
author2 Tien-Fu Chan
author_facet Tien-Fu Chan
Chung Hung Lai
賴宗鴻
author Chung Hung Lai
賴宗鴻
spellingShingle Chung Hung Lai
賴宗鴻
Inverted File Compression in Scaleable Information Systems By Posting Clustering
author_sort Chung Hung Lai
title Inverted File Compression in Scaleable Information Systems By Posting Clustering
title_short Inverted File Compression in Scaleable Information Systems By Posting Clustering
title_full Inverted File Compression in Scaleable Information Systems By Posting Clustering
title_fullStr Inverted File Compression in Scaleable Information Systems By Posting Clustering
title_full_unstemmed Inverted File Compression in Scaleable Information Systems By Posting Clustering
title_sort inverted file compression in scaleable information systems by posting clustering
publishDate 2000
url http://ndltd.ncl.edu.tw/handle/69005222828234331880
work_keys_str_mv AT chunghunglai invertedfilecompressioninscaleableinformationsystemsbypostingclustering
AT làizōnghóng invertedfilecompressioninscaleableinformationsystemsbypostingclustering
AT chunghunglai lìyòngqúnjíwénjiànzhǐbiāoyāsuōkěkuòchōngshìdàxíngzīxùnxìtǒngzhōngzhīsuǒyǐndàngàn
AT làizōnghóng lìyòngqúnjíwénjiànzhǐbiāoyāsuōkěkuòchōngshìdàxíngzīxùnxìtǒngzhōngzhīsuǒyǐndàngàn
_version_ 1716849100088934400