Protein Sequence Clustering Based on Phylogenetic Analysis

碩士 === 國立成功大學 === 資訊管理研究所 === 95 === With the flourishing development of bioinformatics, biologists often use protein sequences to do analysis like predict and annotate unknown proteins. While carrying on these researches, the necessary leading work is clustering. Once the sequences are clustered, t...

Full description

Bibliographic Details
Main Authors:	Chen-yi Hsu, 徐振議
Other Authors:	Hei-chia Wang
Format:	Others
Language:	zh-TW
Published:	2007
Online Access:	http://ndltd.ncl.edu.tw/handle/99493899605491883865

id	ndltd-TW-095NCKU5396013
record_format	oai_dc
spelling	ndltd-TW-095NCKU53960132015-10-13T14:16:09Z http://ndltd.ncl.edu.tw/handle/99493899605491883865 Protein Sequence Clustering Based on Phylogenetic Analysis 以演化分析為基礎之蛋白質超家族序列分群演算法之建立 Chen-yi Hsu 徐振議碩士國立成功大學資訊管理研究所 95 With the flourishing development of bioinformatics, biologists often use protein sequences to do analysis like predict and annotate unknown proteins. While carrying on these researches, the necessary leading work is clustering. Once the sequences are clustered, the sequences with similar function will be at same cluster and it is well to do follow-up researches by the cluster characteristic. Recently, the most common way to do clustering is exploiting the similarity degrees from sequences each other, but some researchers have already proved that this simple way is insufficient and will cause lots of errors. In addition, some biologists want to understand not only the members of the clusters or the number of the clusters, but also the relation in and between the clusters. The problem we want to solve is to focus on the protein superfamily and this area is researches’ less treatment. It is different from general sequences clustering to protein superfamily clustering because there are still some different subfamilies in the superfamily. These subfamilies maybe have different characteristics and relations so it needs to be clustered based on evolution. Therefore only using simple tool or method couldn’t cope with this kind of problem. On the basis of the above mentions, we use the phylogenetic tree to cluster the sequences in protein superfamily. First, the standard package – Phylip is used to reconstruct the phylogenetic tree and then analyze it via a succession of procedures such as distance parsing, threshold choosing, splitting, merging, re-merging, and so on. Finally, the phylogenetic tree will be transformed into several sub-trees, and each sub-tree can represent one cluster. The method is based on phylogenetic tree, so it has evolutionary meanings and better clustering results than the methods based on sequences similarities. Hei-chia Wang 王惠嘉 2007 學位論文 ; thesis 44 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 資訊管理研究所 === 95 === With the flourishing development of bioinformatics, biologists often use protein sequences to do analysis like predict and annotate unknown proteins. While carrying on these researches, the necessary leading work is clustering. Once the sequences are clustered, the sequences with similar function will be at same cluster and it is well to do follow-up researches by the cluster characteristic. Recently, the most common way to do clustering is exploiting the similarity degrees from sequences each other, but some researchers have already proved that this simple way is insufficient and will cause lots of errors. In addition, some biologists want to understand not only the members of the clusters or the number of the clusters, but also the relation in and between the clusters. The problem we want to solve is to focus on the protein superfamily and this area is researches’ less treatment. It is different from general sequences clustering to protein superfamily clustering because there are still some different subfamilies in the superfamily. These subfamilies maybe have different characteristics and relations so it needs to be clustered based on evolution. Therefore only using simple tool or method couldn’t cope with this kind of problem. On the basis of the above mentions, we use the phylogenetic tree to cluster the sequences in protein superfamily. First, the standard package – Phylip is used to reconstruct the phylogenetic tree and then analyze it via a succession of procedures such as distance parsing, threshold choosing, splitting, merging, re-merging, and so on. Finally, the phylogenetic tree will be transformed into several sub-trees, and each sub-tree can represent one cluster. The method is based on phylogenetic tree, so it has evolutionary meanings and better clustering results than the methods based on sequences similarities.
author2	Hei-chia Wang
author_facet	Hei-chia Wang Chen-yi Hsu 徐振議
author	Chen-yi Hsu 徐振議
spellingShingle	Chen-yi Hsu 徐振議 Protein Sequence Clustering Based on Phylogenetic Analysis
author_sort	Chen-yi Hsu
title	Protein Sequence Clustering Based on Phylogenetic Analysis
title_short	Protein Sequence Clustering Based on Phylogenetic Analysis
title_full	Protein Sequence Clustering Based on Phylogenetic Analysis
title_fullStr	Protein Sequence Clustering Based on Phylogenetic Analysis
title_full_unstemmed	Protein Sequence Clustering Based on Phylogenetic Analysis
title_sort	protein sequence clustering based on phylogenetic analysis
publishDate	2007
url	http://ndltd.ncl.edu.tw/handle/99493899605491883865
work_keys_str_mv	AT chenyihsu proteinsequenceclusteringbasedonphylogeneticanalysis AT xúzhènyì proteinsequenceclusteringbasedonphylogeneticanalysis AT chenyihsu yǐyǎnhuàfēnxīwèijīchǔzhīdànbáizhìchāojiāzúxùlièfēnqúnyǎnsuànfǎzhījiànlì AT xúzhènyì yǐyǎnhuàfēnxīwèijīchǔzhīdànbáizhìchāojiāzúxùlièfēnqúnyǎnsuànfǎzhījiànlì
_version_	1717750468038361088

Protein Sequence Clustering Based on Phylogenetic Analysis

Similar Items