A New Fast Vertical Method for Mining Frequent Patterns

Vertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset) is very large or there are a...

Full description

Bibliographic Details
Main Authors:	Zhihong Deng, Zhonghui Wang
Format:	Article
Language:	English
Published:	Atlantis Press 2010-12-01
Series:	International Journal of Computational Intelligence Systems
Subjects:	data mining; frequent pattern mining; data structure; algorithm
Online Access:	https://www.atlantis-press.com/article/2104.pdf

id	doaj-384fd4418cd64fbe812336f566ddf37f
record_format	Article
spelling	doaj-384fd4418cd64fbe812336f566ddf37f2020-11-25T01:38:38ZengAtlantis PressInternational Journal of Computational Intelligence Systems 1875-68832010-12-013610.2991/ijcis.2010.3.6.4A New Fast Vertical Method for Mining Frequent PatternsZhihong DengZhonghui WangVertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset) is very large or there are a very large number of transactions. In this paper, we propose a novel vertical algorithm called PPV for fast frequent pattern discovery. PPV works based on a data structure called Node-lists, which is obtained from a coding prefix-tree called PPC-tree. The efficiency of PPV is achieved with three techniques. First, the Node-list is much more compact compared with previous proposed vertical structure (such as tid-lists or diffsets) since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of support is transformed into the intersection of Node-lists and the complexity of intersecting two Node-lists can be reduced to O(m+n) by an efficient strategy, where m and n are the cardinalities of the two Node-lists respectively. Third, the ancestor-descendant relationship of two nodes, which is the basic step of intersecting Node-lists, can be very efficiently verified by Pre-Post codes of nodes. We experimentally compare our algorithm with FP-growth, and two prominent vertical algorithms (Eclat and dEclat) on a number of databases. The experimental results show that PPV is an efficient algorithm that outperforms FP-growth, Eclat, and dEclat.https://www.atlantis-press.com/article/2104.pdfdata mining; frequent pattern mining; data structure; algorithm
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Zhihong Deng Zhonghui Wang
spellingShingle	Zhihong Deng Zhonghui Wang A New Fast Vertical Method for Mining Frequent Patterns International Journal of Computational Intelligence Systems data mining; frequent pattern mining; data structure; algorithm
author_facet	Zhihong Deng Zhonghui Wang
author_sort	Zhihong Deng
title	A New Fast Vertical Method for Mining Frequent Patterns
title_short	A New Fast Vertical Method for Mining Frequent Patterns
title_full	A New Fast Vertical Method for Mining Frequent Patterns
title_fullStr	A New Fast Vertical Method for Mining Frequent Patterns
title_full_unstemmed	A New Fast Vertical Method for Mining Frequent Patterns
title_sort	new fast vertical method for mining frequent patterns
publisher	Atlantis Press
series	International Journal of Computational Intelligence Systems
issn	1875-6883
publishDate	2010-12-01
description	Vertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset) is very large or there are a very large number of transactions. In this paper, we propose a novel vertical algorithm called PPV for fast frequent pattern discovery. PPV works based on a data structure called Node-lists, which is obtained from a coding prefix-tree called PPC-tree. The efficiency of PPV is achieved with three techniques. First, the Node-list is much more compact compared with previous proposed vertical structure (such as tid-lists or diffsets) since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of support is transformed into the intersection of Node-lists and the complexity of intersecting two Node-lists can be reduced to O(m+n) by an efficient strategy, where m and n are the cardinalities of the two Node-lists respectively. Third, the ancestor-descendant relationship of two nodes, which is the basic step of intersecting Node-lists, can be very efficiently verified by Pre-Post codes of nodes. We experimentally compare our algorithm with FP-growth, and two prominent vertical algorithms (Eclat and dEclat) on a number of databases. The experimental results show that PPV is an efficient algorithm that outperforms FP-growth, Eclat, and dEclat.
topic	data mining; frequent pattern mining; data structure; algorithm
url	https://www.atlantis-press.com/article/2104.pdf
work_keys_str_mv	AT zhihongdeng anewfastverticalmethodforminingfrequentpatterns AT zhonghuiwang anewfastverticalmethodforminingfrequentpatterns AT zhihongdeng newfastverticalmethodforminingfrequentpatterns AT zhonghuiwang newfastverticalmethodforminingfrequentpatterns
_version_	1725052498354372608

A New Fast Vertical Method for Mining Frequent Patterns

Similar Items