Intelligent Resource Management for Large-scale Data Stream Processing

With the increasing trend of using cloud computing resources, the efficient utilization of these resources becomes more and more important. Working with data stream processing is a paradigm gaining in popularity, with tools such as Apache Spark Streaming or Kafka widely available, and companies are...

Full description

Bibliographic Details
Main Author: Stein, Oliver
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för informationsteknologi 2019
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-391927
Description
Summary:With the increasing trend of using cloud computing resources, the efficient utilization of these resources becomes more and more important. Working with data stream processing is a paradigm gaining in popularity, with tools such as Apache Spark Streaming or Kafka widely available, and companies are shifting towards real-time monitoring of data such as sensor networks, financial data or anomaly detection. However, it is difficult for users to efficiently make use of cloud computing resources and studies show that a lot of energy and compute hardware is wasted. We propose an approach to optimizing resource usage in cloud computing environments designed for data stream processing frameworks, based on bin packing algorithms. Test results show that the resource usage is substantially improved as a result, with future improvements suggested to further increase this. The solution was implemented as an extension of the HarmonicIO data stream processing framework and evaluated through simulated workloads.