Iceberg-cube computation with PC cluster
Iceberg queries constitute one of the most important classes of queries for OLAP applications. This thesis investigates using low cost PC clusters to parallelize the computation of iceberg queries. We concentrate on techniques for querying large, high-dimensional data sets. Our exploration of an...
Main Author: | |
---|---|
Language: | English |
Published: |
2009
|
Online Access: | http://hdl.handle.net/2429/11930 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-11930 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-119302014-03-14T15:45:24Z Iceberg-cube computation with PC cluster Yin, Yu Iceberg queries constitute one of the most important classes of queries for OLAP applications. This thesis investigates using low cost PC clusters to parallelize the computation of iceberg queries. We concentrate on techniques for querying large, high-dimensional data sets. Our exploration of an algorithmic space considers tradeoffs between parallelism, compuation, memory and I/O. The main contribution of this thesis is the development and evaluation of various novel, parallel algorithms for CUBE computation and online aggregation. These include the following: one, the CUBE Algorithm RP, which is a straightforward parallel version of BUC[BR99]; two, the CUBE Algorithm BPP, which attempts to reduce I/O by outputting results in a more efficient way; three, the CUBE Algorithms ASL and AHT, which maintain cells in a cuboid in a skip list and a hash table respectively, designed to put the utmost priority on load balancing; four, alternatively, the CUBE Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible; and five, the online aggregating algorithm POL, based on ASL and sampling technique, which gives back instant response and further progressive refinement. We present a thorough performance evaluation of all these algorithms in a variety of parameters, including the dimensionality and the sparseness the cube, the selectivity of the constraints, the number of processors, and the size of the data set. The key to understanding the CUBE algorithms is in that one-algorithm-does-not-fit- all. We recommend a "recipe" which uses PT as the default algorithm, but may also deploy ASL or AHT in appropriate circumstances. The online aggregation algorithm, POL, is especially suitable for computing a high dimensional query over a large data set with a cluster of machines connected by high speed networks. 2009-08-06T19:38:06Z 2009-08-06T19:38:06Z 2001 2009-08-06T19:38:06Z 2001-11 Electronic Thesis or Dissertation http://hdl.handle.net/2429/11930 eng UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
Iceberg queries constitute one of the most important classes of queries for OLAP
applications. This thesis investigates using low cost PC clusters to parallelize the
computation of iceberg queries. We concentrate on techniques for querying large,
high-dimensional data sets. Our exploration of an algorithmic space considers tradeoffs
between parallelism, compuation, memory and I/O. The main contribution of
this thesis is the development and evaluation of various novel, parallel algorithms
for CUBE computation and online aggregation. These include the following: one,
the CUBE Algorithm RP, which is a straightforward parallel version of BUC[BR99];
two, the CUBE Algorithm BPP, which attempts to reduce I/O by outputting results
in a more efficient way; three, the CUBE Algorithms ASL and AHT, which
maintain cells in a cuboid in a skip list and a hash table respectively, designed
to put the utmost priority on load balancing; four, alternatively, the CUBE Algorithm
PT load-balances by using binary partitioning to divide the cube lattice as
evenly as possible; and five, the online aggregating algorithm POL, based on ASL
and sampling technique, which gives back instant response and further progressive
refinement.
We present a thorough performance evaluation of all these algorithms in a
variety of parameters, including the dimensionality and the sparseness the cube, the
selectivity of the constraints, the number of processors, and the size of the data set.
The key to understanding the CUBE algorithms is in that one-algorithm-does-not-fit-
all. We recommend a "recipe" which uses PT as the default algorithm, but may
also deploy ASL or AHT in appropriate circumstances. The online aggregation
algorithm, POL, is especially suitable for computing a high dimensional query over
a large data set with a cluster of machines connected by high speed networks. |
author |
Yin, Yu |
spellingShingle |
Yin, Yu Iceberg-cube computation with PC cluster |
author_facet |
Yin, Yu |
author_sort |
Yin, Yu |
title |
Iceberg-cube computation with PC cluster |
title_short |
Iceberg-cube computation with PC cluster |
title_full |
Iceberg-cube computation with PC cluster |
title_fullStr |
Iceberg-cube computation with PC cluster |
title_full_unstemmed |
Iceberg-cube computation with PC cluster |
title_sort |
iceberg-cube computation with pc cluster |
publishDate |
2009 |
url |
http://hdl.handle.net/2429/11930 |
work_keys_str_mv |
AT yinyu icebergcubecomputationwithpccluster |
_version_ |
1716652400799907840 |