HYBRIDJOIN for Near Real-time Data Warehousing

In order to make timely and effective decisions, businesses need the latest information from data warehouse repositories. To keep these repositories up-to-date with respect to the end user updates, near real-time data integration is required. An important phase in near real-time data integration is...

Full description

Bibliographic Details
Main Authors: Naeem, M (Author), Dobbie, G (Author), Weber, G (Author)
Format: Others
Published: University of Auckland, 2012-04-26T03:51:13Z.
Subjects:
Online Access:Get fulltext
LEADER 01843 am a22001693u 4500
001 4064
042 |a dc 
100 1 0 |a Naeem, M  |e author 
700 1 0 |a Dobbie, G  |e author 
700 1 0 |a Weber, G  |e author 
245 0 0 |a HYBRIDJOIN for Near Real-time Data Warehousing 
260 |b University of Auckland,   |c 2012-04-26T03:51:13Z. 
500 |a Software Engineering, The University of Auckland. (2010, July). Research Report Series (TR Number: UoA-SE-2010-2). Retrieved from (see Publisher's Version). 
520 |a In order to make timely and effective decisions, businesses need the latest information from data warehouse repositories. To keep these repositories up-to-date with respect to the end user updates, near real-time data integration is required. An important phase in near real-time data integration is data transformation where the stream of updates is joined with disk-based master data. The stream-based algorithm, Hybrid Join (HYBRIDJOIN), performs well in general but has not been optimized for real world conditions. In real world market economics, a few products are sold more frequently as compared to the rest of the products; therefore, a large number of sale transactions relate to a small portion of master data. In the transformation phase, to join the input stream of sales transactions with disk-based master data, HYBRIDJOIN loads that particular part of master data each time from the disk, increasing the disk access cost significantly with a negative effect on performance. Contrarily, X-HYBRIDJOIN stores that particular part of master data in memory permanently, eliminating the disk access overhead significantly. To validate the arguments and analyze the performance of X-HYBRIDJOIN an experimental study is conducted. 
540 |a OpenAccess 
655 7 |a Commissioned Report 
856 |z Get fulltext  |u http://hdl.handle.net/10292/4064