A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph minin...

Full description

Bibliographic Details
Main Authors: Fengcai Qiao, Xin Zhang, Pei Li, Zhaoyun Ding, Shanshan Jia, Hui Wang
Format: Article
Language:English
Published: MDPI AG 2018-02-01
Series:Applied Sciences
Subjects:
Online Access:http://www.mdpi.com/2076-3417/8/2/230
id doaj-1c62dba863da4d0cbf91a888eb68be4f
record_format Article
spelling doaj-1c62dba863da4d0cbf91a888eb68be4f2020-11-24T22:18:44ZengMDPI AGApplied Sciences2076-34172018-02-018223010.3390/app8020230app8020230A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using SparkFengcai Qiao0Xin Zhang1Pei Li2Zhaoyun Ding3Shanshan Jia4Hui Wang5College of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaDigital Media Center, Hunan Education Publishing House, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaFrequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold.http://www.mdpi.com/2076-3417/8/2/230frequent subgraph miningparallel, algorithmconstraint satisfaction problemSpark
collection DOAJ
language English
format Article
sources DOAJ
author Fengcai Qiao
Xin Zhang
Pei Li
Zhaoyun Ding
Shanshan Jia
Hui Wang
spellingShingle Fengcai Qiao
Xin Zhang
Pei Li
Zhaoyun Ding
Shanshan Jia
Hui Wang
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
Applied Sciences
frequent subgraph mining
parallel, algorithm
constraint satisfaction problem
Spark
author_facet Fengcai Qiao
Xin Zhang
Pei Li
Zhaoyun Ding
Shanshan Jia
Hui Wang
author_sort Fengcai Qiao
title A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
title_short A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
title_full A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
title_fullStr A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
title_full_unstemmed A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
title_sort parallel approach for frequent subgraph mining in a single large graph using spark
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2018-02-01
description Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold.
topic frequent subgraph mining
parallel, algorithm
constraint satisfaction problem
Spark
url http://www.mdpi.com/2076-3417/8/2/230
work_keys_str_mv AT fengcaiqiao aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT xinzhang aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT peili aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT zhaoyunding aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT shanshanjia aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT huiwang aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT fengcaiqiao parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT xinzhang parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT peili parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT zhaoyunding parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT shanshanjia parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
AT huiwang parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark
_version_ 1725781956316102656