Distributed KNN Query Processing for Sequences with Similar Patterns

碩士 === 國立中興大學 === 電機工程學系所 === 100 === Sequence data are ubiquitous in our daily life, such as i.e., moving object trajectories、biological gene sequences、records of the commodity purchasing, which are of high research value. Sequence Pattern Analysis is one of the most important research fields in da...

Full description

Bibliographic Details
Main Authors: Nian-Song Wan, 萬年松
Other Authors: Hsiao-Ping Tsai
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/24127054406707668319
Description
Summary:碩士 === 國立中興大學 === 電機工程學系所 === 100 === Sequence data are ubiquitous in our daily life, such as i.e., moving object trajectories、biological gene sequences、records of the commodity purchasing, which are of high research value. Sequence Pattern Analysis is one of the most important research fields in data mining. Its goal is to discover important features (patterns) hidden in sequence data, such as movement regularity in trajectories, motif in DNA or protein sequences, or purchasing behavior in transaction sequences. Recently, with the advance of DNA microarray chips, sensing, and wireless techniques, the huge amounts of sequence data are rapidly accumulated. Since a centralized server cannot afford the huge and ubiquitous sequence data, a distributed approach is the natural trend. Facing the huge and distributed sequence data, how to query data efficiently is an important and difficulty problem. In the thesis, we focus on the problem of distributed KNN query of sequences with similar patterns. To solve this problem, we propose a distributed and incremental KNN query with binary probing (DIKNN-BP) algorithm. It mines sequence patterns by using Probabilistic Suffix Tree(PST)and transmits pattern to remote servers in a progressive manner. The remote servers estimate the upper and lower bound of similarity value between target object and remote candidate objects and based on which to prune unqualified candidates. In addition, we derive five theorems and propose a binary probing approach to speed up the converge of upper and lower bound, also the KNN query. According to the experimental results, our DIKNN-BP algorithm can obtain the KNN solution with fewer patterns transmitted and with less data transmitted. Compared with a naive approach, it achieves a shorter query execution time.