Adaptive Weighted Distance for Feature Vectors of Biological Sequences

碩士 === 國立嘉義大學 === 資訊工程學系研究所 === 95 === Study on biological sequence database similarity searching has received substantial attention in the past decade. The similarity search in the biological databases is an important issue. Similarity search in biology sequences has attention in the recent studi...

Full description

Bibliographic Details
Main Authors: Pei-Yuan Jou, 周培元
Other Authors: Huang-Cheng Kuo
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/47274308615858302312
Description
Summary:碩士 === 國立嘉義大學 === 資訊工程學系研究所 === 95 === Study on biological sequence database similarity searching has received substantial attention in the past decade. The similarity search in the biological databases is an important issue. Similarity search in biology sequences has attention in the recent studies. Sequence alignment is the essential task for searching of similar sequence in bioinformatics. The biological sequence databases have getting larger in past decade. Finding sequences that similar to the query sequence is a time consuming task. By transforming sequences into numeric feature vectors, we can quickly filter out sequences whose feature vectors are far to the feature vector of the query sequence. The numeric feature vector contains three groups of features: Count, Extensible-Relative Position Dispersion (XRPD), and Extensible-Absolute Position Dispersion (XAPD) of a DNA sequence. Each group has four dimensions for A, C, T, and G. When computing distance between two feature vectors, Euclidean distance and L1 distance are commonly used. The author proposed an adaptive weighting distance. The adaptive weighting derives from the four nucleotides from the Count group. And the weighting applied on both XRPD and XAPD. In other words, if a certain kind of nucleotide appears much frequent than the other kinds of nucleotides, the weight for the kind of nucleotide should also be large in XRPD and XAPD groups. Experiments show that such distance of feature vectors helps reflect the distance between sequences.