Seeing the forest for the trees: tree-based uncertain frequent pattern mining

Many frequent pattern mining algorithms operate on precise data, where each data point is an exact accounting of a phenomena (e.g., I have exactly two sisters). Alas, reasoning this way is a simplification for many real world observations. Measurements, predictions, environmental factors, human erro...

Full description

Bibliographic Details
Main Author: MacKinnon, Richard Kyle
Other Authors: Leung, Carson K.-S. (Computer Science)
Published: Springer International Publishing 2016
Subjects:
Online Access:http://hdl.handle.net/1993/31059
id ndltd-MANITOBA-oai-mspace.lib.umanitoba.ca-1993-31059
record_format oai_dc
spelling ndltd-MANITOBA-oai-mspace.lib.umanitoba.ca-1993-310592016-01-15T03:52:16Z Seeing the forest for the trees: tree-based uncertain frequent pattern mining MacKinnon, Richard Kyle Leung, Carson K.-S. (Computer Science) Wang, Yang (Computer Science) Wang, Xikui (Statistics) Data mining Databases Many frequent pattern mining algorithms operate on precise data, where each data point is an exact accounting of a phenomena (e.g., I have exactly two sisters). Alas, reasoning this way is a simplification for many real world observations. Measurements, predictions, environmental factors, human error, &ct. all introduce a degree of uncertainty into the mix. Tree-based frequent pattern mining algorithms such as FP-growth are particularly efficient due to their compact in-memory representations of the input database, but their uncertain extensions can require many more tree nodes. I propose new algorithms with tightened upper bounds to expected support, Tube-S and Tube-P, which mine frequent patterns from uncertain data. Extensive experimentation and analysis on datasets with different probability distributions are undertaken that show the tightness of my bounds in different situations. February 2016 2016-01-13T22:43:27Z 2016-01-13T22:43:27Z 2014-05 2014-09 2014-09 2014-12 2014-12 MacKinnon, R.K., Leung, C.K.-S., Tanbeer, S.K. (2014) A scalable data analytics algorithm for mining frequent patterns from uncertain data. In Proc. PAKDDW 2014: 404-416. Springer International Publishing. Leung, C.K.-S., MacKinnon, R.K. (2014) BLIMP: a compact tree structure for uncertain frequent pattern mining. In Proc. DaWaK 2014: 115-123. Springer International Publishing. Leung, C.K.-S., MacKinnon, R.K., Tanbeer, S.K. (2014) Tightening upper bounds to the expected support for uncertain frequent pattern mining. In Proc. KES 2014: 328-337. Elsevier. MacKinnon, R.K., Strauss, T.D., Leung, C.K.-S. (2014) DISC: efficient uncertain frequent pattern mining with tightened upper bounds. In Proc. ICDMW 2014: 1038-1045. IEEE Computer Society Press. Leung, C.K.-S., MacKinnon, R.K., Tanbeer, S.K. (2014) Fast algorithms for frequent itemset mining from uncertain data. In Proc. ICDM 2014: 893-898. IEEE Computer Society Press. http://hdl.handle.net/1993/31059 Springer International Publishing Springer International Publishing Elsevier IEEE Computer Society Press IEEE Computer Society Press
collection NDLTD
sources NDLTD
topic Data mining
Databases
spellingShingle Data mining
Databases
MacKinnon, Richard Kyle
Seeing the forest for the trees: tree-based uncertain frequent pattern mining
description Many frequent pattern mining algorithms operate on precise data, where each data point is an exact accounting of a phenomena (e.g., I have exactly two sisters). Alas, reasoning this way is a simplification for many real world observations. Measurements, predictions, environmental factors, human error, &ct. all introduce a degree of uncertainty into the mix. Tree-based frequent pattern mining algorithms such as FP-growth are particularly efficient due to their compact in-memory representations of the input database, but their uncertain extensions can require many more tree nodes. I propose new algorithms with tightened upper bounds to expected support, Tube-S and Tube-P, which mine frequent patterns from uncertain data. Extensive experimentation and analysis on datasets with different probability distributions are undertaken that show the tightness of my bounds in different situations. === February 2016
author2 Leung, Carson K.-S. (Computer Science)
author_facet Leung, Carson K.-S. (Computer Science)
MacKinnon, Richard Kyle
author MacKinnon, Richard Kyle
author_sort MacKinnon, Richard Kyle
title Seeing the forest for the trees: tree-based uncertain frequent pattern mining
title_short Seeing the forest for the trees: tree-based uncertain frequent pattern mining
title_full Seeing the forest for the trees: tree-based uncertain frequent pattern mining
title_fullStr Seeing the forest for the trees: tree-based uncertain frequent pattern mining
title_full_unstemmed Seeing the forest for the trees: tree-based uncertain frequent pattern mining
title_sort seeing the forest for the trees: tree-based uncertain frequent pattern mining
publisher Springer International Publishing
publishDate 2016
url http://hdl.handle.net/1993/31059
work_keys_str_mv AT mackinnonrichardkyle seeingtheforestforthetreestreebaseduncertainfrequentpatternmining
_version_ 1718160968854274048