Adaptive Weighted Distance for Feature Vectors of Biological Sequences

碩士 === 國立嘉義大學 === 資訊工程學系研究所 === 95 === Study on biological sequence database similarity searching has received substantial attention in the past decade. The similarity search in the biological databases is an important issue. Similarity search in biology sequences has attention in the recent studi...

Full description

Bibliographic Details
Main Authors: Pei-Yuan Jou, 周培元
Other Authors: Huang-Cheng Kuo
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/47274308615858302312
id ndltd-TW-095NCYU5392020
record_format oai_dc
spelling ndltd-TW-095NCYU53920202015-10-13T14:53:15Z http://ndltd.ncl.edu.tw/handle/47274308615858302312 Adaptive Weighted Distance for Feature Vectors of Biological Sequences 生物序列之特徵向量的權重距離調配機制 Pei-Yuan Jou 周培元 碩士 國立嘉義大學 資訊工程學系研究所 95 Study on biological sequence database similarity searching has received substantial attention in the past decade. The similarity search in the biological databases is an important issue. Similarity search in biology sequences has attention in the recent studies. Sequence alignment is the essential task for searching of similar sequence in bioinformatics. The biological sequence databases have getting larger in past decade. Finding sequences that similar to the query sequence is a time consuming task. By transforming sequences into numeric feature vectors, we can quickly filter out sequences whose feature vectors are far to the feature vector of the query sequence. The numeric feature vector contains three groups of features: Count, Extensible-Relative Position Dispersion (XRPD), and Extensible-Absolute Position Dispersion (XAPD) of a DNA sequence. Each group has four dimensions for A, C, T, and G. When computing distance between two feature vectors, Euclidean distance and L1 distance are commonly used. The author proposed an adaptive weighting distance. The adaptive weighting derives from the four nucleotides from the Count group. And the weighting applied on both XRPD and XAPD. In other words, if a certain kind of nucleotide appears much frequent than the other kinds of nucleotides, the weight for the kind of nucleotide should also be large in XRPD and XAPD groups. Experiments show that such distance of feature vectors helps reflect the distance between sequences. Huang-Cheng Kuo 郭煌政 2007 學位論文 ; thesis 0 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立嘉義大學 === 資訊工程學系研究所 === 95 === Study on biological sequence database similarity searching has received substantial attention in the past decade. The similarity search in the biological databases is an important issue. Similarity search in biology sequences has attention in the recent studies. Sequence alignment is the essential task for searching of similar sequence in bioinformatics. The biological sequence databases have getting larger in past decade. Finding sequences that similar to the query sequence is a time consuming task. By transforming sequences into numeric feature vectors, we can quickly filter out sequences whose feature vectors are far to the feature vector of the query sequence. The numeric feature vector contains three groups of features: Count, Extensible-Relative Position Dispersion (XRPD), and Extensible-Absolute Position Dispersion (XAPD) of a DNA sequence. Each group has four dimensions for A, C, T, and G. When computing distance between two feature vectors, Euclidean distance and L1 distance are commonly used. The author proposed an adaptive weighting distance. The adaptive weighting derives from the four nucleotides from the Count group. And the weighting applied on both XRPD and XAPD. In other words, if a certain kind of nucleotide appears much frequent than the other kinds of nucleotides, the weight for the kind of nucleotide should also be large in XRPD and XAPD groups. Experiments show that such distance of feature vectors helps reflect the distance between sequences.
author2 Huang-Cheng Kuo
author_facet Huang-Cheng Kuo
Pei-Yuan Jou
周培元
author Pei-Yuan Jou
周培元
spellingShingle Pei-Yuan Jou
周培元
Adaptive Weighted Distance for Feature Vectors of Biological Sequences
author_sort Pei-Yuan Jou
title Adaptive Weighted Distance for Feature Vectors of Biological Sequences
title_short Adaptive Weighted Distance for Feature Vectors of Biological Sequences
title_full Adaptive Weighted Distance for Feature Vectors of Biological Sequences
title_fullStr Adaptive Weighted Distance for Feature Vectors of Biological Sequences
title_full_unstemmed Adaptive Weighted Distance for Feature Vectors of Biological Sequences
title_sort adaptive weighted distance for feature vectors of biological sequences
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/47274308615858302312
work_keys_str_mv AT peiyuanjou adaptiveweighteddistanceforfeaturevectorsofbiologicalsequences
AT zhōupéiyuán adaptiveweighteddistanceforfeaturevectorsofbiologicalsequences
AT peiyuanjou shēngwùxùlièzhītèzhēngxiàngliàngdequánzhòngjùlídiàopèijīzhì
AT zhōupéiyuán shēngwùxùlièzhītèzhēngxiàngliàngdequánzhòngjùlídiàopèijīzhì
_version_ 1717760198252167168