Self-management for large-scale distributed systems

Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing...

Full description

Bibliographic Details
Main Author:	Al-Shishtawy, Ahmad
Format:	Doctoral Thesis
Language:	English
Published:	Computer Systems Laboratory 2012
Subjects:	Computer and Information Science Data- och informationsvetenskap
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:ri:diva-24046 http://nbn-resolving.de/urn:isbn:9789175014371

id	ndltd-UPSALLA1-oai-DiVA.org-ri-24046
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-ri-240462017-04-26T08:11:46ZSelf-management for large-scale distributed systemsengAl-Shishtawy, AhmadComputer Systems LaboratorySweden : KTH2012Computer and Information ScienceData- och informationsvetenskapAutonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control. Doctoral thesis, monographinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:ri:diva-24046urn:isbn:9789175014371SICS dissertation series, 1101-1335 ; 57application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Doctoral Thesis
sources	NDLTD
topic	Computer and Information Science Data- och informationsvetenskap
spellingShingle	Computer and Information Science Data- och informationsvetenskap Al-Shishtawy, Ahmad Self-management for large-scale distributed systems
description	Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control.
author	Al-Shishtawy, Ahmad
author_facet	Al-Shishtawy, Ahmad
author_sort	Al-Shishtawy, Ahmad
title	Self-management for large-scale distributed systems
title_short	Self-management for large-scale distributed systems
title_full	Self-management for large-scale distributed systems
title_fullStr	Self-management for large-scale distributed systems
title_full_unstemmed	Self-management for large-scale distributed systems
title_sort	self-management for large-scale distributed systems
publisher	Computer Systems Laboratory
publishDate	2012
url	http://urn.kb.se/resolve?urn=urn:nbn:se:ri:diva-24046 http://nbn-resolving.de/urn:isbn:9789175014371
work_keys_str_mv	AT alshishtawyahmad selfmanagementforlargescaledistributedsystems
_version_	1718444727211130880

Self-management for large-scale distributed systems

Similar Items