A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph minin...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | http://www.mdpi.com/2076-3417/8/2/230 |
id |
doaj-1c62dba863da4d0cbf91a888eb68be4f |
---|---|
record_format |
Article |
spelling |
doaj-1c62dba863da4d0cbf91a888eb68be4f2020-11-24T22:18:44ZengMDPI AGApplied Sciences2076-34172018-02-018223010.3390/app8020230app8020230A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using SparkFengcai Qiao0Xin Zhang1Pei Li2Zhaoyun Ding3Shanshan Jia4Hui Wang5College of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaDigital Media Center, Hunan Education Publishing House, Changsha 410073, Hunan, ChinaCollege of Engineering System, National University of Defense Technology, Changsha 410073, Hunan, ChinaFrequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold.http://www.mdpi.com/2076-3417/8/2/230frequent subgraph miningparallel, algorithmconstraint satisfaction problemSpark |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Fengcai Qiao Xin Zhang Pei Li Zhaoyun Ding Shanshan Jia Hui Wang |
spellingShingle |
Fengcai Qiao Xin Zhang Pei Li Zhaoyun Ding Shanshan Jia Hui Wang A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark Applied Sciences frequent subgraph mining parallel, algorithm constraint satisfaction problem Spark |
author_facet |
Fengcai Qiao Xin Zhang Pei Li Zhaoyun Ding Shanshan Jia Hui Wang |
author_sort |
Fengcai Qiao |
title |
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark |
title_short |
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark |
title_full |
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark |
title_fullStr |
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark |
title_full_unstemmed |
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark |
title_sort |
parallel approach for frequent subgraph mining in a single large graph using spark |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2018-02-01 |
description |
Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold. |
topic |
frequent subgraph mining parallel, algorithm constraint satisfaction problem Spark |
url |
http://www.mdpi.com/2076-3417/8/2/230 |
work_keys_str_mv |
AT fengcaiqiao aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT xinzhang aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT peili aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT zhaoyunding aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT shanshanjia aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT huiwang aparallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT fengcaiqiao parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT xinzhang parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT peili parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT zhaoyunding parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT shanshanjia parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark AT huiwang parallelapproachforfrequentsubgraphmininginasinglelargegraphusingspark |
_version_ |
1725781956316102656 |