Approximately Mining Recently Repeating Patterns on Data Streams

碩士 === 國立臺灣師範大學 === 資訊教育學系 === 94 === Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Accordingly, the traditional strategies for mining repeati...

Full description

Bibliographic Details
Main Author: 周蓓旻
Other Authors: 柯佳伶
Format: Others
Language:zh-TW
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/20533636393226266446
id ndltd-TW-094NTNU5395028
record_format oai_dc
spelling ndltd-TW-094NTNU53950282016-06-01T04:21:13Z http://ndltd.ncl.edu.tw/handle/20533636393226266446 Approximately Mining Recently Repeating Patterns on Data Streams 近似探勘資料流中最近重覆樣式方法之研究 周蓓旻 碩士 國立臺灣師範大學 資訊教育學系 94 Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Accordingly, the traditional strategies for mining repeating patterns on static database are not suitable in a data stream environment. Besides, in the dynamic environment of a data stream, mining the repeating patterns from the whole history data sequence does not extract the newest trend of patterns in the data stream. For this reason, two algorithms for efficiently mining recently repeating patterns in a data stream are proposed in this thesis. One is named the appearing-bit-sequence-based incremental mining algorithm and the other one is named the basic-patterns estimating-based algorithm. The incremental mining approach applies appearing bit sequences to compute the frequencies of data patterns efficiently within the sliding window. By maintaining the appearing bit sequences of maximal repeating patterns, the newly generated recently repeating patterns are mined from the maintained information to reduce processing cost when the window slides. The estimating-based method maintains the repeating patterns, potential repeating patterns, and 2-item patterns, a partition-based scheme is used to count the frequencies of patterns. By constructing a data structure to support efficiently access of remained patterns, the frequency of an unretained pattern is estimated according to the frequencies of its maximum prefix-subpattern and suffix-subpattern. The experimental results show that the incremental mining method is an efficient way for mining recently repeating patterns correctly. And the estimating-based method provides a even more faster way to discover recently repeating patterns from a data stream approximately. 柯佳伶 2006 學位論文 ; thesis 53 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣師範大學 === 資訊教育學系 === 94 === Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Accordingly, the traditional strategies for mining repeating patterns on static database are not suitable in a data stream environment. Besides, in the dynamic environment of a data stream, mining the repeating patterns from the whole history data sequence does not extract the newest trend of patterns in the data stream. For this reason, two algorithms for efficiently mining recently repeating patterns in a data stream are proposed in this thesis. One is named the appearing-bit-sequence-based incremental mining algorithm and the other one is named the basic-patterns estimating-based algorithm. The incremental mining approach applies appearing bit sequences to compute the frequencies of data patterns efficiently within the sliding window. By maintaining the appearing bit sequences of maximal repeating patterns, the newly generated recently repeating patterns are mined from the maintained information to reduce processing cost when the window slides. The estimating-based method maintains the repeating patterns, potential repeating patterns, and 2-item patterns, a partition-based scheme is used to count the frequencies of patterns. By constructing a data structure to support efficiently access of remained patterns, the frequency of an unretained pattern is estimated according to the frequencies of its maximum prefix-subpattern and suffix-subpattern. The experimental results show that the incremental mining method is an efficient way for mining recently repeating patterns correctly. And the estimating-based method provides a even more faster way to discover recently repeating patterns from a data stream approximately.
author2 柯佳伶
author_facet 柯佳伶
周蓓旻
author 周蓓旻
spellingShingle 周蓓旻
Approximately Mining Recently Repeating Patterns on Data Streams
author_sort 周蓓旻
title Approximately Mining Recently Repeating Patterns on Data Streams
title_short Approximately Mining Recently Repeating Patterns on Data Streams
title_full Approximately Mining Recently Repeating Patterns on Data Streams
title_fullStr Approximately Mining Recently Repeating Patterns on Data Streams
title_full_unstemmed Approximately Mining Recently Repeating Patterns on Data Streams
title_sort approximately mining recently repeating patterns on data streams
publishDate 2006
url http://ndltd.ncl.edu.tw/handle/20533636393226266446
work_keys_str_mv AT zhōubèimín approximatelyminingrecentlyrepeatingpatternsondatastreams
AT zhōubèimín jìnshìtànkānzīliàoliúzhōngzuìjìnzhòngfùyàngshìfāngfǎzhīyánjiū
_version_ 1718289950433083392