Protein Sequence Clustering Based on Phylogenetic Analysis

碩士 === 國立成功大學 === 資訊管理研究所 === 95 === With the flourishing development of bioinformatics, biologists often use protein sequences to do analysis like predict and annotate unknown proteins. While carrying on these researches, the necessary leading work is clustering. Once the sequences are clustered, t...

Full description

Bibliographic Details
Main Authors: Chen-yi Hsu, 徐振議
Other Authors: Hei-chia Wang
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/99493899605491883865
id ndltd-TW-095NCKU5396013
record_format oai_dc
spelling ndltd-TW-095NCKU53960132015-10-13T14:16:09Z http://ndltd.ncl.edu.tw/handle/99493899605491883865 Protein Sequence Clustering Based on Phylogenetic Analysis 以演化分析為基礎之蛋白質超家族序列分群演算法之建立 Chen-yi Hsu 徐振議 碩士 國立成功大學 資訊管理研究所 95 With the flourishing development of bioinformatics, biologists often use protein sequences to do analysis like predict and annotate unknown proteins. While carrying on these researches, the necessary leading work is clustering. Once the sequences are clustered, the sequences with similar function will be at same cluster and it is well to do follow-up researches by the cluster characteristic. Recently, the most common way to do clustering is exploiting the similarity degrees from sequences each other, but some researchers have already proved that this simple way is insufficient and will cause lots of errors. In addition, some biologists want to understand not only the members of the clusters or the number of the clusters, but also the relation in and between the clusters. The problem we want to solve is to focus on the protein superfamily and this area is researches’ less treatment. It is different from general sequences clustering to protein superfamily clustering because there are still some different subfamilies in the superfamily. These subfamilies maybe have different characteristics and relations so it needs to be clustered based on evolution. Therefore only using simple tool or method couldn’t cope with this kind of problem. On the basis of the above mentions, we use the phylogenetic tree to cluster the sequences in protein superfamily. First, the standard package – Phylip is used to reconstruct the phylogenetic tree and then analyze it via a succession of procedures such as distance parsing, threshold choosing, splitting, merging, re-merging, and so on. Finally, the phylogenetic tree will be transformed into several sub-trees, and each sub-tree can represent one cluster. The method is based on phylogenetic tree, so it has evolutionary meanings and better clustering results than the methods based on sequences similarities. Hei-chia Wang 王惠嘉 2007 學位論文 ; thesis 44 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊管理研究所 === 95 === With the flourishing development of bioinformatics, biologists often use protein sequences to do analysis like predict and annotate unknown proteins. While carrying on these researches, the necessary leading work is clustering. Once the sequences are clustered, the sequences with similar function will be at same cluster and it is well to do follow-up researches by the cluster characteristic. Recently, the most common way to do clustering is exploiting the similarity degrees from sequences each other, but some researchers have already proved that this simple way is insufficient and will cause lots of errors. In addition, some biologists want to understand not only the members of the clusters or the number of the clusters, but also the relation in and between the clusters. The problem we want to solve is to focus on the protein superfamily and this area is researches’ less treatment. It is different from general sequences clustering to protein superfamily clustering because there are still some different subfamilies in the superfamily. These subfamilies maybe have different characteristics and relations so it needs to be clustered based on evolution. Therefore only using simple tool or method couldn’t cope with this kind of problem. On the basis of the above mentions, we use the phylogenetic tree to cluster the sequences in protein superfamily. First, the standard package – Phylip is used to reconstruct the phylogenetic tree and then analyze it via a succession of procedures such as distance parsing, threshold choosing, splitting, merging, re-merging, and so on. Finally, the phylogenetic tree will be transformed into several sub-trees, and each sub-tree can represent one cluster. The method is based on phylogenetic tree, so it has evolutionary meanings and better clustering results than the methods based on sequences similarities.
author2 Hei-chia Wang
author_facet Hei-chia Wang
Chen-yi Hsu
徐振議
author Chen-yi Hsu
徐振議
spellingShingle Chen-yi Hsu
徐振議
Protein Sequence Clustering Based on Phylogenetic Analysis
author_sort Chen-yi Hsu
title Protein Sequence Clustering Based on Phylogenetic Analysis
title_short Protein Sequence Clustering Based on Phylogenetic Analysis
title_full Protein Sequence Clustering Based on Phylogenetic Analysis
title_fullStr Protein Sequence Clustering Based on Phylogenetic Analysis
title_full_unstemmed Protein Sequence Clustering Based on Phylogenetic Analysis
title_sort protein sequence clustering based on phylogenetic analysis
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/99493899605491883865
work_keys_str_mv AT chenyihsu proteinsequenceclusteringbasedonphylogeneticanalysis
AT xúzhènyì proteinsequenceclusteringbasedonphylogeneticanalysis
AT chenyihsu yǐyǎnhuàfēnxīwèijīchǔzhīdànbáizhìchāojiāzúxùlièfēnqúnyǎnsuànfǎzhījiànlì
AT xúzhènyì yǐyǎnhuàfēnxīwèijīchǔzhīdànbáizhìchāojiāzúxùlièfēnqúnyǎnsuànfǎzhījiànlì
_version_ 1717750468038361088