Algorithms for evolving graph analysis

In many applications, entities and their relationships are represented by graphs. Examples include social networks (users and friendship), the WWW (web pages and hyperlinks) and bibliographic networks (authors and co-authorship). In a dynamic world, information changes and so the graphs representing...

Full description

Bibliographic Details
Main Authors: Ren, Chenghui, 任成會
Other Authors: Kao, CM
Language:English
Published: The University of Hong Kong (Pokfulam, Hong Kong) 2014
Subjects:
Online Access:http://hdl.handle.net/10722/197105
id ndltd-HKU-oai-hub.hku.hk-10722-197105
record_format oai_dc
spelling ndltd-HKU-oai-hub.hku.hk-10722-1971052015-07-29T04:02:34Z Algorithms for evolving graph analysis Ren, Chenghui 任成會 Kao, CM Cheung, DWL Graph theory - Data processing In many applications, entities and their relationships are represented by graphs. Examples include social networks (users and friendship), the WWW (web pages and hyperlinks) and bibliographic networks (authors and co-authorship). In a dynamic world, information changes and so the graphs representing the information evolve with time. For example, a Facebook link between two friends is established, or a hyperlink is added to a web page. We propose that historical graph-structured data be archived for analytical processing. We call a historical evolving graph sequence an EGS. We study the problem of efficient query processing on an EGS, which finds many applications that lead to interesting evolving graph analysis. To solve the problem, we propose a solution framework called FVF and a cluster-based LU decomposition algorithm called CLUDE, which can evaluate queries efficiently to support EGS analysis. The Find-Verify-and-Fix (FVF) framework applies to a wide range of queries. We demonstrate how some important graph measures, including shortest-path distance, closeness centrality and graph centrality, can be efficiently computed from EGSs using FVF. Since an EGS generally contains numerous large graphs, we also discuss several compact storage models that support our FVF framework. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in EGS query processing. A graph can be conveniently modeled by a matrix from which various quantitative measures are derived like PageRank and SALSA and Personalized PageRank and Random Walk with Restart. To compute these measures, linear systems of the form Ax = b, where A is a matrix that captures a graph's structure, need to be solved. To facilitate solving the linear system, the matrix A is often decomposed into two triangular matrices (L and U). In a dynamic world, the graph that models it changes with time and thus is the matrix A that represents the graph. We consider a sequence of evolving graphs and its associated sequence of evolving matrices. We study how LU-decomposition should be done over the sequence so that (1) the decomposition is efficient and (2) the resulting LU matrices best preserve the sparsity of the matrices A's (i.e., the number of extra non-zero entries introduced in L and U are minimized). We propose a cluster-based algorithm CLUDE for solving the problem. Through an experimental study, we show that CLUDE is about an order of magnitude faster than the traditional incremental update algorithm. The number of extra non-zero entries introduced by CLUDE is also about an order of magnitude fewer than that of the traditional algorithm. CLUDE is thus an efficient algorithm for LU decomposition that produces high-quality LU matrices over an evolving matrix sequence. published_or_final_version Computer Science Doctoral Doctor of Philosophy 2014-05-07T23:15:27Z 2014-05-07T23:15:27Z 2014 PG_Thesis 10.5353/th_b5185948 b5185948 http://hdl.handle.net/10722/197105 eng HKU Theses Online (HKUTO) Creative Commons: Attribution 3.0 Hong Kong License The author retains all proprietary rights, (such as patent rights) and the right to use in future works. The University of Hong Kong (Pokfulam, Hong Kong)
collection NDLTD
language English
sources NDLTD
topic Graph theory - Data processing
spellingShingle Graph theory - Data processing
Ren, Chenghui
任成會
Algorithms for evolving graph analysis
description In many applications, entities and their relationships are represented by graphs. Examples include social networks (users and friendship), the WWW (web pages and hyperlinks) and bibliographic networks (authors and co-authorship). In a dynamic world, information changes and so the graphs representing the information evolve with time. For example, a Facebook link between two friends is established, or a hyperlink is added to a web page. We propose that historical graph-structured data be archived for analytical processing. We call a historical evolving graph sequence an EGS. We study the problem of efficient query processing on an EGS, which finds many applications that lead to interesting evolving graph analysis. To solve the problem, we propose a solution framework called FVF and a cluster-based LU decomposition algorithm called CLUDE, which can evaluate queries efficiently to support EGS analysis. The Find-Verify-and-Fix (FVF) framework applies to a wide range of queries. We demonstrate how some important graph measures, including shortest-path distance, closeness centrality and graph centrality, can be efficiently computed from EGSs using FVF. Since an EGS generally contains numerous large graphs, we also discuss several compact storage models that support our FVF framework. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in EGS query processing. A graph can be conveniently modeled by a matrix from which various quantitative measures are derived like PageRank and SALSA and Personalized PageRank and Random Walk with Restart. To compute these measures, linear systems of the form Ax = b, where A is a matrix that captures a graph's structure, need to be solved. To facilitate solving the linear system, the matrix A is often decomposed into two triangular matrices (L and U). In a dynamic world, the graph that models it changes with time and thus is the matrix A that represents the graph. We consider a sequence of evolving graphs and its associated sequence of evolving matrices. We study how LU-decomposition should be done over the sequence so that (1) the decomposition is efficient and (2) the resulting LU matrices best preserve the sparsity of the matrices A's (i.e., the number of extra non-zero entries introduced in L and U are minimized). We propose a cluster-based algorithm CLUDE for solving the problem. Through an experimental study, we show that CLUDE is about an order of magnitude faster than the traditional incremental update algorithm. The number of extra non-zero entries introduced by CLUDE is also about an order of magnitude fewer than that of the traditional algorithm. CLUDE is thus an efficient algorithm for LU decomposition that produces high-quality LU matrices over an evolving matrix sequence. === published_or_final_version === Computer Science === Doctoral === Doctor of Philosophy
author2 Kao, CM
author_facet Kao, CM
Ren, Chenghui
任成會
author Ren, Chenghui
任成會
author_sort Ren, Chenghui
title Algorithms for evolving graph analysis
title_short Algorithms for evolving graph analysis
title_full Algorithms for evolving graph analysis
title_fullStr Algorithms for evolving graph analysis
title_full_unstemmed Algorithms for evolving graph analysis
title_sort algorithms for evolving graph analysis
publisher The University of Hong Kong (Pokfulam, Hong Kong)
publishDate 2014
url http://hdl.handle.net/10722/197105
work_keys_str_mv AT renchenghui algorithmsforevolvinggraphanalysis
AT rènchénghuì algorithmsforevolvinggraphanalysis
_version_ 1716814237486022656