Automatic software upgrades for distributed systems

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004. === Includes bibliographical references (p. 156-164). === Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the...

Full description

Bibliographic Details
Main Author: Ajmani, Sameer, 1976-
Other Authors: Barbara H. Liskov.
Format: Others
Language:en_US
Published: Massachusetts Institute of Technology 2005
Subjects:
Online Access:http://hdl.handle.net/1721.1/28717
Description
Summary:Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004. === Includes bibliographical references (p. 156-164). === Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades may happen gradually, and there may be long periods of time when different nodes are running different software versions and need to communicate using incompatible protocols. We present a methodology and infrastructure that address these challenges and make it possible to upgrade distributed systems automatically while limiting service disruption. Our methodology defines how to enable nodes to interoperate across versions, how to preserve the state of a system across upgrades, and how to schedule an upgrade so as to limit service disrup- tion. The approach is modular: defining an upgrade requires understanding only the new software and the version it replaces. The upgrade infrastructure is a generic platform for distributing and installing software while enabling nodes to interoperate across versions. The infrastructure requires no access to the system source code and is transparent: node software is unaware that different versions even exist. We have implemented a prototype of the infrastructure called Upstart that intercepts socket communication using a dynamically-linked C++ library. Experiments show that Upstart has low overhead and works well for both local-area-and Internet systems. === by Sameer Ajmani. === Ph.D.