Efficient Mining and Maintaining Algorithms for Sequential Patterns

博士 === 淡江大學 === 資訊工程學系博士班 === 96 === Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Autom...

Full description

Bibliographic Details
Main Authors: Wei-Hua Hao, 郝維華
Other Authors: Nancy P. Lin
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/51482633830794967649
id ndltd-TW-096TKU05392057
record_format oai_dc
spelling ndltd-TW-096TKU053920572015-10-13T13:47:54Z http://ndltd.ncl.edu.tw/handle/51482633830794967649 Efficient Mining and Maintaining Algorithms for Sequential Patterns 高效率挖掘及維護序列型樣演算法之研究 Wei-Hua Hao 郝維華 博士 淡江大學 資訊工程學系博士班 96 Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Automation of information system has urged this situation and result in huge amount of data stored in databases worldwide. These electronic data is considered to be the mirror of the real world, which we can make full use it, properly. We try to discover interesting information or knowledge that conceived in databases via various methods, such as statistic, query, graphics and data mining, to further understand the world we lived in. A modern day information system is accountable for this purpose. In these diverse approaches, data mining has caught the eyes of many domain experts, and gain achievements. In the last decade, mining sequential patterns became one promising topic and arouse our interest. The essence of data mining is to dealing with huge amount of data, many previous researches focus on propose efficient algorithm with less search time and run time. Hence, this dissertation has focused on develop algorithms that mining sequential pattern efficiently, and to be maintainable. In our point of view there are two criteria to evaluate mining algorithm: scan times of database, volume of working space and searching space. As we already known the speed of accessing data from hard drive is slower than from main memory, usually by factor of two. This implies that the less scan times the better. Working space is required to host data during mining process. Searching space is the space to store the result, frequent sequence set or data model. The less space required by algorithm the more chance to fit all processed data into main memory. Consequently, both the efficiency and performance will be improved. With these in mind, all three algorithms been proposed in this dissertation have these three characters: scan database once, mining without candidates and mining full set of frequent sequences. First algorithm, FAL, is designed to fully utilize both the downward closure property and upward closure property to construct a lattice data model with maximal sequence representation. Second algorithm is FMCSP that inherit the legacy of FAL, but applied closed sequence concept instead of maximal sequence. Note that, closed sequence is the longest sequence in its equivalent class, it can shrink the size of searching space, and furthermore, with adjustable ability for user to set, or tune, the threshold of minimum support after the mining of data model had been constructed. Finally, algorithm MMSP has inherited the legacy of previous algorithms to deal with incremental sequence database. MMSP is capable to handle incremental data added into data model one by one and batch data without rerun whole database from scratch. Nancy P. Lin 林丕靜 2008 學位論文 ; thesis 85 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 淡江大學 === 資訊工程學系博士班 === 96 === Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Automation of information system has urged this situation and result in huge amount of data stored in databases worldwide. These electronic data is considered to be the mirror of the real world, which we can make full use it, properly. We try to discover interesting information or knowledge that conceived in databases via various methods, such as statistic, query, graphics and data mining, to further understand the world we lived in. A modern day information system is accountable for this purpose. In these diverse approaches, data mining has caught the eyes of many domain experts, and gain achievements. In the last decade, mining sequential patterns became one promising topic and arouse our interest. The essence of data mining is to dealing with huge amount of data, many previous researches focus on propose efficient algorithm with less search time and run time. Hence, this dissertation has focused on develop algorithms that mining sequential pattern efficiently, and to be maintainable. In our point of view there are two criteria to evaluate mining algorithm: scan times of database, volume of working space and searching space. As we already known the speed of accessing data from hard drive is slower than from main memory, usually by factor of two. This implies that the less scan times the better. Working space is required to host data during mining process. Searching space is the space to store the result, frequent sequence set or data model. The less space required by algorithm the more chance to fit all processed data into main memory. Consequently, both the efficiency and performance will be improved. With these in mind, all three algorithms been proposed in this dissertation have these three characters: scan database once, mining without candidates and mining full set of frequent sequences. First algorithm, FAL, is designed to fully utilize both the downward closure property and upward closure property to construct a lattice data model with maximal sequence representation. Second algorithm is FMCSP that inherit the legacy of FAL, but applied closed sequence concept instead of maximal sequence. Note that, closed sequence is the longest sequence in its equivalent class, it can shrink the size of searching space, and furthermore, with adjustable ability for user to set, or tune, the threshold of minimum support after the mining of data model had been constructed. Finally, algorithm MMSP has inherited the legacy of previous algorithms to deal with incremental sequence database. MMSP is capable to handle incremental data added into data model one by one and batch data without rerun whole database from scratch.
author2 Nancy P. Lin
author_facet Nancy P. Lin
Wei-Hua Hao
郝維華
author Wei-Hua Hao
郝維華
spellingShingle Wei-Hua Hao
郝維華
Efficient Mining and Maintaining Algorithms for Sequential Patterns
author_sort Wei-Hua Hao
title Efficient Mining and Maintaining Algorithms for Sequential Patterns
title_short Efficient Mining and Maintaining Algorithms for Sequential Patterns
title_full Efficient Mining and Maintaining Algorithms for Sequential Patterns
title_fullStr Efficient Mining and Maintaining Algorithms for Sequential Patterns
title_full_unstemmed Efficient Mining and Maintaining Algorithms for Sequential Patterns
title_sort efficient mining and maintaining algorithms for sequential patterns
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/51482633830794967649
work_keys_str_mv AT weihuahao efficientminingandmaintainingalgorithmsforsequentialpatterns
AT hǎowéihuá efficientminingandmaintainingalgorithmsforsequentialpatterns
AT weihuahao gāoxiàolǜwājuéjíwéihùxùlièxíngyàngyǎnsuànfǎzhīyánjiū
AT hǎowéihuá gāoxiàolǜwājuéjíwéihùxùlièxíngyàngyǎnsuànfǎzhīyánjiū
_version_ 1717743455099158528