A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method

Query optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each w...

Full description

Bibliographic Details
Main Authors: Elham Azhir, Nima Jafari Navimipour, Mehdi Hosseinzadeh, Arash Sharifi, Aso Darwesh
Format: Article
Language:English
Published: PeerJ Inc. 2021-06-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-580.pdf
id doaj-609e9e47089a44368122d6c1dcd894f5
record_format Article
spelling doaj-609e9e47089a44368122d6c1dcd894f52021-06-03T15:05:28ZengPeerJ Inc.PeerJ Computer Science2376-59922021-06-017e58010.7717/peerj-cs.580A technique for parallel query optimization using MapReduce framework and a semantic-based clustering methodElham Azhir0Nima Jafari Navimipour1Mehdi Hosseinzadeh2Arash Sharifi3Aso Darwesh4Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, IranFuture Technology Research Center, National Yunlin University of Science and Technology, Douliou, Yunlin, Taiwan, R.O.C.Pattern Recognition and Machine Learning Lab, Gachon University, 1342 Seongnamdaero, Sujeonggu, Seongnam, Republic of KoreaDepartment of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, IranDepartment of Information Technology, University of Human Development, Sulaymaniyah, IraqQuery optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each with a corresponding execution cost. To produce an effective query plan thus requires examining a large number of alternative plans. Access plan recommendation is an alternative technique to database query optimization, which reuses the previously-generated QEPs to execute new queries. In this technique, the query optimizer uses clustering methods to identify groups of similar queries. However, clustering such large datasets is challenging for traditional clustering algorithms due to huge processing time. Numerous cloud-based platforms have been introduced that offer low-cost solutions for the processing of distributed queries such as Hadoop, Hive, Pig, etc. This paper has applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce. The results demonstrate the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.https://peerj.com/articles/cs-580.pdfQuery optimizationAccess plan recommendationCluster computingParallel ProcessingMapReduceDBSCAN Algorithm
collection DOAJ
language English
format Article
sources DOAJ
author Elham Azhir
Nima Jafari Navimipour
Mehdi Hosseinzadeh
Arash Sharifi
Aso Darwesh
spellingShingle Elham Azhir
Nima Jafari Navimipour
Mehdi Hosseinzadeh
Arash Sharifi
Aso Darwesh
A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
PeerJ Computer Science
Query optimization
Access plan recommendation
Cluster computing
Parallel Processing
MapReduce
DBSCAN Algorithm
author_facet Elham Azhir
Nima Jafari Navimipour
Mehdi Hosseinzadeh
Arash Sharifi
Aso Darwesh
author_sort Elham Azhir
title A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
title_short A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
title_full A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
title_fullStr A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
title_full_unstemmed A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
title_sort technique for parallel query optimization using mapreduce framework and a semantic-based clustering method
publisher PeerJ Inc.
series PeerJ Computer Science
issn 2376-5992
publishDate 2021-06-01
description Query optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each with a corresponding execution cost. To produce an effective query plan thus requires examining a large number of alternative plans. Access plan recommendation is an alternative technique to database query optimization, which reuses the previously-generated QEPs to execute new queries. In this technique, the query optimizer uses clustering methods to identify groups of similar queries. However, clustering such large datasets is challenging for traditional clustering algorithms due to huge processing time. Numerous cloud-based platforms have been introduced that offer low-cost solutions for the processing of distributed queries such as Hadoop, Hive, Pig, etc. This paper has applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce. The results demonstrate the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.
topic Query optimization
Access plan recommendation
Cluster computing
Parallel Processing
MapReduce
DBSCAN Algorithm
url https://peerj.com/articles/cs-580.pdf
work_keys_str_mv AT elhamazhir atechniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT nimajafarinavimipour atechniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT mehdihosseinzadeh atechniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT arashsharifi atechniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT asodarwesh atechniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT elhamazhir techniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT nimajafarinavimipour techniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT mehdihosseinzadeh techniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT arashsharifi techniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
AT asodarwesh techniqueforparallelqueryoptimizationusingmapreduceframeworkandasemanticbasedclusteringmethod
_version_ 1721399120600498176