Performance Optimization Techniques and Tools for Distributed Graph Processing

In this thesis, we propose optimization techniques for distributed graph processing. First, we describe a data processing pipeline that leverages an iterative graph algorithm for automatic classification of web trackers. Using this application as a motivating example, we examine how asymmetrical con...

Full description

Bibliographic Details
Main Author: Kalavri, Vasiliki
Format: Doctoral Thesis
Language:English
Published: KTH, Programvaruteknik och Datorsystem, SCS 2016
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192471
http://nbn-resolving.de/urn:isbn:978-91-7729-101-5
Description
Summary:In this thesis, we propose optimization techniques for distributed graph processing. First, we describe a data processing pipeline that leverages an iterative graph algorithm for automatic classification of web trackers. Using this application as a motivating example, we examine how asymmetrical convergence of iterative graph algorithms can be used to reduce the amount of computation and communication in large-scale graph analysis. We propose an optimization framework for fixpoint algorithms and a declarative API for writing fixpoint applications. Our framework uses a cost model to automatically exploit asymmetrical convergence and evaluate execution strategies during runtime. We show that our cost model achieves speedup of up to 1.7x and communication savings of up to 54%. Next, we propose to use the concepts of semi-metricity and the metric backbone to reduce the amount of data that needs to be processed in large-scale graph analysis. We provide a distributed algorithm for computing the metric backbone using the vertex-centric programming model. Using the backbone, we can reduce graph sizes up to 88% and achieve speedup of up to 6.7x. === <p>QC 20160919</p>