Query Optimization for Database Federation Systems

Database federation is one approach to data integration, in which a middleware, called mediator, provides uniform access to a number of heterogeneous data sources. In this thesis, we focus on the query optimization for distributed joins over database federation. One important observation in query o...

Full description

Bibliographic Details
Main Author: Wang, Di
Other Authors: Elke A. Rundensteiner, Reader
Format: Others
Published: Digital WPI 2009
Subjects:
Online Access:https://digitalcommons.wpi.edu/etd-theses/718
https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1717&context=etd-theses
id ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-1717
record_format oai_dc
spelling ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-17172019-03-22T05:49:40Z Query Optimization for Database Federation Systems Wang, Di Database federation is one approach to data integration, in which a middleware, called mediator, provides uniform access to a number of heterogeneous data sources. In this thesis, we focus on the query optimization for distributed joins over database federation. One important observation in query optimization over distributed database system is that run-time conditions (namely available buffer size, CPU utilization in machine and network environment) can significantly affect the execution cost of a query plan. However, in existing database federation systems, very few studies have addressed run-time conditions. It is a challenging problem, because usually the mediator is not able to know the run-time conditions of remote sites and considering run-time conditions will bring about extra complexity to the optimizer. This thesis proposes the Cluster-and-Conquer algorithm for query optimization over database federation while efficiently considering run-time conditions. This algorithm has three-fold benefits. Firstly, the run-time conditions of machines are now available for cluster mediator. Secondly, each cluster mediator can deal with its own sub query concurrently, so the complexity of processing query plan is decreased. Thirdly, the algorithm outperforms other related approaches in terms of“cost of costing", because it removes unnecessary inter-cluster operations in the early stage. I have implemented a prototype data federation system with Cluster-and-Conquer algorithm. The experimental results showed the capabilities and efficiency of our algorithm and described the target scenarios where the algorithm performs better than other related approaches. 2009-05-04T07:00:00Z text application/pdf https://digitalcommons.wpi.edu/etd-theses/718 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1717&context=etd-theses Masters Theses (All Theses, All Years) Digital WPI Elke A. Rundensteiner, Reader Murali Mani, Advisor database federation query optimization
collection NDLTD
format Others
sources NDLTD
topic database federation
query optimization
spellingShingle database federation
query optimization
Wang, Di
Query Optimization for Database Federation Systems
description Database federation is one approach to data integration, in which a middleware, called mediator, provides uniform access to a number of heterogeneous data sources. In this thesis, we focus on the query optimization for distributed joins over database federation. One important observation in query optimization over distributed database system is that run-time conditions (namely available buffer size, CPU utilization in machine and network environment) can significantly affect the execution cost of a query plan. However, in existing database federation systems, very few studies have addressed run-time conditions. It is a challenging problem, because usually the mediator is not able to know the run-time conditions of remote sites and considering run-time conditions will bring about extra complexity to the optimizer. This thesis proposes the Cluster-and-Conquer algorithm for query optimization over database federation while efficiently considering run-time conditions. This algorithm has three-fold benefits. Firstly, the run-time conditions of machines are now available for cluster mediator. Secondly, each cluster mediator can deal with its own sub query concurrently, so the complexity of processing query plan is decreased. Thirdly, the algorithm outperforms other related approaches in terms of“cost of costing", because it removes unnecessary inter-cluster operations in the early stage. I have implemented a prototype data federation system with Cluster-and-Conquer algorithm. The experimental results showed the capabilities and efficiency of our algorithm and described the target scenarios where the algorithm performs better than other related approaches.
author2 Elke A. Rundensteiner, Reader
author_facet Elke A. Rundensteiner, Reader
Wang, Di
author Wang, Di
author_sort Wang, Di
title Query Optimization for Database Federation Systems
title_short Query Optimization for Database Federation Systems
title_full Query Optimization for Database Federation Systems
title_fullStr Query Optimization for Database Federation Systems
title_full_unstemmed Query Optimization for Database Federation Systems
title_sort query optimization for database federation systems
publisher Digital WPI
publishDate 2009
url https://digitalcommons.wpi.edu/etd-theses/718
https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1717&context=etd-theses
work_keys_str_mv AT wangdi queryoptimizationfordatabasefederationsystems
_version_ 1719006264743165952