Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra

Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled C...

Full description

Bibliographic Details
Main Author: KATIKI REDDY, RAHUL REDDY
Format: Others
Language:English
Published: Blekinge Tekniska Högskola, Institutionen för programvaruteknik 2020
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20182
id ndltd-UPSALLA1-oai-DiVA.org-bth-20182
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-bth-201822020-07-15T07:09:31ZImproving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache CassandraengKATIKI REDDY, RAHUL REDDYBlekinge Tekniska Högskola, Institutionen för programvaruteknik2020Apache CassandraCompaction StrategyRandom CompactionNoSQLDesign Science.Software EngineeringProgramvaruteknikBackground: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled Compaction Strategy are the most used generic compaction strategies for different use cases. Space Amplification and Write Amplification are the main limitations of the above compaction strategies, respectively. This research aims to address the limitations of existing generic compaction strategies. Objectives: A new random compaction strategy will be created to improve the efficiency and effectiveness of compaction. This newly created random compaction strategy will be evaluated by comparing the read, write and space amplification with the existing generic compaction strategies, for different use cases. Methods: In this study, Design Science has been used as a research method to answer both the research questions. Focus groups meetings have been conducted to gain knowledge on the limitations of existing compaction strategies, newly created random compaction strategy, and it’s appropriate solutions. During the evaluation, The metrics have been collected from Prometheus server and visualization is carried out in Grafana server. The compaction strategies are compared significantly by performing statistical tests. Results: The results in this study showed that the random compaction strategy is performing almost similar to Leveled Compaction Strategy. The Random Compaction Strategy solves the space amplification problem and write amplification problem in the Size Tiered Compaction Strategy and Leveled Compaction Strategy, respectively. In this section, eight important metrics have been analyzed for all three compaction strategies. Conclusions: The main artefact of this research is a new Random Compaction Strategy. After performing two iterations, a new stable random compaction strategy is designed. The results were analyzed by comparing the Size Tiered Compaction Strategy, Leveled Compaction Strategy and Random Compaction Strategy on two different use cases. The new random compaction strategy has performed great for Ericsson buffer management use case. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:bth-20182application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Apache Cassandra
Compaction Strategy
Random Compaction
NoSQL
Design Science.
Software Engineering
Programvaruteknik
spellingShingle Apache Cassandra
Compaction Strategy
Random Compaction
NoSQL
Design Science.
Software Engineering
Programvaruteknik
KATIKI REDDY, RAHUL REDDY
Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
description Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled Compaction Strategy are the most used generic compaction strategies for different use cases. Space Amplification and Write Amplification are the main limitations of the above compaction strategies, respectively. This research aims to address the limitations of existing generic compaction strategies. Objectives: A new random compaction strategy will be created to improve the efficiency and effectiveness of compaction. This newly created random compaction strategy will be evaluated by comparing the read, write and space amplification with the existing generic compaction strategies, for different use cases. Methods: In this study, Design Science has been used as a research method to answer both the research questions. Focus groups meetings have been conducted to gain knowledge on the limitations of existing compaction strategies, newly created random compaction strategy, and it’s appropriate solutions. During the evaluation, The metrics have been collected from Prometheus server and visualization is carried out in Grafana server. The compaction strategies are compared significantly by performing statistical tests. Results: The results in this study showed that the random compaction strategy is performing almost similar to Leveled Compaction Strategy. The Random Compaction Strategy solves the space amplification problem and write amplification problem in the Size Tiered Compaction Strategy and Leveled Compaction Strategy, respectively. In this section, eight important metrics have been analyzed for all three compaction strategies. Conclusions: The main artefact of this research is a new Random Compaction Strategy. After performing two iterations, a new stable random compaction strategy is designed. The results were analyzed by comparing the Size Tiered Compaction Strategy, Leveled Compaction Strategy and Random Compaction Strategy on two different use cases. The new random compaction strategy has performed great for Ericsson buffer management use case.
author KATIKI REDDY, RAHUL REDDY
author_facet KATIKI REDDY, RAHUL REDDY
author_sort KATIKI REDDY, RAHUL REDDY
title Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
title_short Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
title_full Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
title_fullStr Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
title_full_unstemmed Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
title_sort improving efficiency of data compaction by creating & evaluating a random compaction strategy in apache cassandra
publisher Blekinge Tekniska Högskola, Institutionen för programvaruteknik
publishDate 2020
url http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20182
work_keys_str_mv AT katikireddyrahulreddy improvingefficiencyofdatacompactionbycreatingampevaluatingarandomcompactionstrategyinapachecassandra
_version_ 1719329153731264512