Efficient Mining and Maintaining Algorithms for Sequential Patterns
博士 === 淡江大學 === 資訊工程學系博士班 === 96 === Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Autom...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2008
|
Online Access: | http://ndltd.ncl.edu.tw/handle/51482633830794967649 |
id |
ndltd-TW-096TKU05392057 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-096TKU053920572015-10-13T13:47:54Z http://ndltd.ncl.edu.tw/handle/51482633830794967649 Efficient Mining and Maintaining Algorithms for Sequential Patterns 高效率挖掘及維護序列型樣演算法之研究 Wei-Hua Hao 郝維華 博士 淡江大學 資訊工程學系博士班 96 Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Automation of information system has urged this situation and result in huge amount of data stored in databases worldwide. These electronic data is considered to be the mirror of the real world, which we can make full use it, properly. We try to discover interesting information or knowledge that conceived in databases via various methods, such as statistic, query, graphics and data mining, to further understand the world we lived in. A modern day information system is accountable for this purpose. In these diverse approaches, data mining has caught the eyes of many domain experts, and gain achievements. In the last decade, mining sequential patterns became one promising topic and arouse our interest. The essence of data mining is to dealing with huge amount of data, many previous researches focus on propose efficient algorithm with less search time and run time. Hence, this dissertation has focused on develop algorithms that mining sequential pattern efficiently, and to be maintainable. In our point of view there are two criteria to evaluate mining algorithm: scan times of database, volume of working space and searching space. As we already known the speed of accessing data from hard drive is slower than from main memory, usually by factor of two. This implies that the less scan times the better. Working space is required to host data during mining process. Searching space is the space to store the result, frequent sequence set or data model. The less space required by algorithm the more chance to fit all processed data into main memory. Consequently, both the efficiency and performance will be improved. With these in mind, all three algorithms been proposed in this dissertation have these three characters: scan database once, mining without candidates and mining full set of frequent sequences. First algorithm, FAL, is designed to fully utilize both the downward closure property and upward closure property to construct a lattice data model with maximal sequence representation. Second algorithm is FMCSP that inherit the legacy of FAL, but applied closed sequence concept instead of maximal sequence. Note that, closed sequence is the longest sequence in its equivalent class, it can shrink the size of searching space, and furthermore, with adjustable ability for user to set, or tune, the threshold of minimum support after the mining of data model had been constructed. Finally, algorithm MMSP has inherited the legacy of previous algorithms to deal with incremental sequence database. MMSP is capable to handle incremental data added into data model one by one and batch data without rerun whole database from scratch. Nancy P. Lin 林丕靜 2008 學位論文 ; thesis 85 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 淡江大學 === 資訊工程學系博士班 === 96 === Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Automation of information system has urged this situation and result in huge amount of data stored in databases worldwide. These electronic data is considered to be the mirror of the real world, which we can make full use it, properly. We try to discover interesting information or knowledge that conceived in databases via various methods, such as statistic, query, graphics and data mining, to further understand the world we lived in. A modern day information system is accountable for this purpose.
In these diverse approaches, data mining has caught the eyes of many domain experts, and gain achievements. In the last decade, mining sequential patterns became one promising topic and arouse our interest. The essence of data mining is to dealing with huge amount of data, many previous researches focus on propose efficient algorithm with less search time and run time. Hence, this dissertation has focused on develop algorithms that mining sequential pattern efficiently, and to be maintainable. In our point of view there are two criteria to evaluate mining algorithm: scan times of database, volume of working space and searching space. As we already known the speed of accessing data from hard drive is slower than from main memory, usually by factor of two. This implies that the less scan times the better. Working space is required to host data during mining process. Searching space is the space to store the result, frequent sequence set or data model. The less space required by algorithm the more chance to fit all processed data into main memory. Consequently, both the efficiency and performance will be improved. With these in mind, all three algorithms been proposed in this dissertation have these three characters: scan database once, mining without candidates and mining full set of frequent sequences.
First algorithm, FAL, is designed to fully utilize both the downward closure property and upward closure property to construct a lattice data model with maximal sequence representation. Second algorithm is FMCSP that inherit the legacy of FAL, but applied closed sequence concept instead of maximal sequence. Note that, closed sequence is the longest sequence in its equivalent class, it can shrink the size of searching space, and furthermore, with adjustable ability for user to set, or tune, the threshold of minimum support after the mining of data model had been constructed. Finally, algorithm MMSP has inherited the legacy of previous algorithms to deal with incremental sequence database. MMSP is capable to handle incremental data added into data model one by one and batch data without rerun whole database from scratch.
|
author2 |
Nancy P. Lin |
author_facet |
Nancy P. Lin Wei-Hua Hao 郝維華 |
author |
Wei-Hua Hao 郝維華 |
spellingShingle |
Wei-Hua Hao 郝維華 Efficient Mining and Maintaining Algorithms for Sequential Patterns |
author_sort |
Wei-Hua Hao |
title |
Efficient Mining and Maintaining Algorithms for Sequential Patterns |
title_short |
Efficient Mining and Maintaining Algorithms for Sequential Patterns |
title_full |
Efficient Mining and Maintaining Algorithms for Sequential Patterns |
title_fullStr |
Efficient Mining and Maintaining Algorithms for Sequential Patterns |
title_full_unstemmed |
Efficient Mining and Maintaining Algorithms for Sequential Patterns |
title_sort |
efficient mining and maintaining algorithms for sequential patterns |
publishDate |
2008 |
url |
http://ndltd.ncl.edu.tw/handle/51482633830794967649 |
work_keys_str_mv |
AT weihuahao efficientminingandmaintainingalgorithmsforsequentialpatterns AT hǎowéihuá efficientminingandmaintainingalgorithmsforsequentialpatterns AT weihuahao gāoxiàolǜwājuéjíwéihùxùlièxíngyàngyǎnsuànfǎzhīyánjiū AT hǎowéihuá gāoxiàolǜwājuéjíwéihùxùlièxíngyàngyǎnsuànfǎzhīyánjiū |
_version_ |
1717743455099158528 |