Summary: | Given a data sequence, sequential pattern mining, which finds frequent sequence patterns among them, is an important data mining problem. However, in the existing sequential pattern mining, only the purchase order of the items is considered, and the position where the item is purchased is not considered. In this paper, we developed a sequential pattern mining algorithm using Apache spark. The proposed algorithm finds frequent sequential patterns in parallel by distributing data to several machines. Experimentally, we performed a comprehensive performance study on the proposed algorithm by varying various parameter values using various synthetic data. Experimental results show that the proposed algorithm shows a linear speed improvement over the number of machines.
|