Summary: | In a distributed environment, the volume of graph database increases quickly because graphs emerge from several autonomous sources. Sub-graph query processing is a challenging problem in distributed environment. Centralized approaches proposed many algorithms, they mine frequent subgraphs from the graph database and construct an index which is very expensive. These algorithms require more number of database scans to mine frequent subgraphs and they use filter and verify approach, which requires many subgraph isomorphism tests. In this paper, we design a novel Map-Reduce based multiple subgraph query processing framework, namely MSP. MSP processes multiple graph queries using distributed index. The framework completely relies on the graph partition and indexing. Moreover, in order to improve its performance, we propose several solutions to balance the workload and reduce the size of Integrated Graph Index. We propose a structure-based partitioning technique and distributed way of building Integrated Graph Index. This work uses two Map-Reduce rounds, the first Map-Reduce round partitions the graphs and creating index for each partition, second Map-Reduce round processes sub-graph queries and index maintenance. A good partitioning will reduce the index size by distributing the load equally to the machines in the cluster and improves the performance of query evaluation. This graph partition and Integrated Graph Index reduces the search space of query graphs. Our approach allows to add data graphs incrementally to Integrated Graph Index while doing query processing. We experimentally show that our approach decreases remarkably the execution time and scales the subgraph query processing to large graph databases. Keywords: Graph database, Big data, Structure based graph partitioning, Parallel processing, Map-Reduce, Integrated Graph Index
|