Distributed KNN Query Processing for Sequences with Similar Patterns

碩士 === 國立中興大學 === 電機工程學系所 === 100 === Sequence data are ubiquitous in our daily life, such as i.e., moving object trajectories、biological gene sequences、records of the commodity purchasing, which are of high research value. Sequence Pattern Analysis is one of the most important research fields in da...

Full description

Bibliographic Details
Main Authors: Nian-Song Wan, 萬年松
Other Authors: Hsiao-Ping Tsai
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/24127054406707668319
id ndltd-TW-100NCHU5441074
record_format oai_dc
spelling ndltd-TW-100NCHU54410742016-11-06T04:19:14Z http://ndltd.ncl.edu.tw/handle/24127054406707668319 Distributed KNN Query Processing for Sequences with Similar Patterns 分散式環境下基於序列型樣之 KNN 查詢 Nian-Song Wan 萬年松 碩士 國立中興大學 電機工程學系所 100 Sequence data are ubiquitous in our daily life, such as i.e., moving object trajectories、biological gene sequences、records of the commodity purchasing, which are of high research value. Sequence Pattern Analysis is one of the most important research fields in data mining. Its goal is to discover important features (patterns) hidden in sequence data, such as movement regularity in trajectories, motif in DNA or protein sequences, or purchasing behavior in transaction sequences. Recently, with the advance of DNA microarray chips, sensing, and wireless techniques, the huge amounts of sequence data are rapidly accumulated. Since a centralized server cannot afford the huge and ubiquitous sequence data, a distributed approach is the natural trend. Facing the huge and distributed sequence data, how to query data efficiently is an important and difficulty problem. In the thesis, we focus on the problem of distributed KNN query of sequences with similar patterns. To solve this problem, we propose a distributed and incremental KNN query with binary probing (DIKNN-BP) algorithm. It mines sequence patterns by using Probabilistic Suffix Tree(PST)and transmits pattern to remote servers in a progressive manner. The remote servers estimate the upper and lower bound of similarity value between target object and remote candidate objects and based on which to prune unqualified candidates. In addition, we derive five theorems and propose a binary probing approach to speed up the converge of upper and lower bound, also the KNN query. According to the experimental results, our DIKNN-BP algorithm can obtain the KNN solution with fewer patterns transmitted and with less data transmitted. Compared with a naive approach, it achieves a shorter query execution time. Hsiao-Ping Tsai 蔡曉萍 2012 學位論文 ; thesis 73 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中興大學 === 電機工程學系所 === 100 === Sequence data are ubiquitous in our daily life, such as i.e., moving object trajectories、biological gene sequences、records of the commodity purchasing, which are of high research value. Sequence Pattern Analysis is one of the most important research fields in data mining. Its goal is to discover important features (patterns) hidden in sequence data, such as movement regularity in trajectories, motif in DNA or protein sequences, or purchasing behavior in transaction sequences. Recently, with the advance of DNA microarray chips, sensing, and wireless techniques, the huge amounts of sequence data are rapidly accumulated. Since a centralized server cannot afford the huge and ubiquitous sequence data, a distributed approach is the natural trend. Facing the huge and distributed sequence data, how to query data efficiently is an important and difficulty problem. In the thesis, we focus on the problem of distributed KNN query of sequences with similar patterns. To solve this problem, we propose a distributed and incremental KNN query with binary probing (DIKNN-BP) algorithm. It mines sequence patterns by using Probabilistic Suffix Tree(PST)and transmits pattern to remote servers in a progressive manner. The remote servers estimate the upper and lower bound of similarity value between target object and remote candidate objects and based on which to prune unqualified candidates. In addition, we derive five theorems and propose a binary probing approach to speed up the converge of upper and lower bound, also the KNN query. According to the experimental results, our DIKNN-BP algorithm can obtain the KNN solution with fewer patterns transmitted and with less data transmitted. Compared with a naive approach, it achieves a shorter query execution time.
author2 Hsiao-Ping Tsai
author_facet Hsiao-Ping Tsai
Nian-Song Wan
萬年松
author Nian-Song Wan
萬年松
spellingShingle Nian-Song Wan
萬年松
Distributed KNN Query Processing for Sequences with Similar Patterns
author_sort Nian-Song Wan
title Distributed KNN Query Processing for Sequences with Similar Patterns
title_short Distributed KNN Query Processing for Sequences with Similar Patterns
title_full Distributed KNN Query Processing for Sequences with Similar Patterns
title_fullStr Distributed KNN Query Processing for Sequences with Similar Patterns
title_full_unstemmed Distributed KNN Query Processing for Sequences with Similar Patterns
title_sort distributed knn query processing for sequences with similar patterns
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/24127054406707668319
work_keys_str_mv AT niansongwan distributedknnqueryprocessingforsequenceswithsimilarpatterns
AT wànniánsōng distributedknnqueryprocessingforsequenceswithsimilarpatterns
AT niansongwan fēnsànshìhuánjìngxiàjīyúxùlièxíngyàngzhīknncháxún
AT wànniánsōng fēnsànshìhuánjìngxiàjīyúxùlièxíngyàngzhīknncháxún
_version_ 1718390950875627520