Clustering of gene expression data: performance and similarity analysis

Abstract Background DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze ge...

Full description

Bibliographic Details
Main Authors:	Huang Chun-Hsi, Yin Longde, Ni Jun
Format:	Article
Language:	English
Published:	BMC 2006-12-01
Series:	BMC Bioinformatics

id	doaj-ea8c7ee9e3a8423094eceadfb30d7bc5
record_format	Article
spelling	doaj-ea8c7ee9e3a8423094eceadfb30d7bc52020-11-25T01:59:01ZengBMCBMC Bioinformatics1471-21052006-12-017Suppl 4S1910.1186/1471-2105-7-S4-S19Clustering of gene expression data: performance and similarity analysisHuang Chun-HsiYin LongdeNi Jun<p>Abstract</p> <p>Background</p> <p>DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research.</p> <p>Results</p> <p>In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast <it>Saccharomyces cerevisiae </it>gene expression data, and compare their performance. We then introduce <it>Cluster Diff</it>, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the <it>Cluster Diff </it>can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms.</p> <p>Conclusion</p> <p>HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, <it>Cluster Diff</it>, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.</p>
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Huang Chun-Hsi Yin Longde Ni Jun
spellingShingle	Huang Chun-Hsi Yin Longde Ni Jun Clustering of gene expression data: performance and similarity analysis BMC Bioinformatics
author_facet	Huang Chun-Hsi Yin Longde Ni Jun
author_sort	Huang Chun-Hsi
title	Clustering of gene expression data: performance and similarity analysis
title_short	Clustering of gene expression data: performance and similarity analysis
title_full	Clustering of gene expression data: performance and similarity analysis
title_fullStr	Clustering of gene expression data: performance and similarity analysis
title_full_unstemmed	Clustering of gene expression data: performance and similarity analysis
title_sort	clustering of gene expression data: performance and similarity analysis
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2006-12-01
description	<p>Abstract</p> <p>Background</p> <p>DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research.</p> <p>Results</p> <p>In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast <it>Saccharomyces cerevisiae </it>gene expression data, and compare their performance. We then introduce <it>Cluster Diff</it>, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the <it>Cluster Diff </it>can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms.</p> <p>Conclusion</p> <p>HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, <it>Cluster Diff</it>, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.</p>
work_keys_str_mv	AT huangchunhsi clusteringofgeneexpressiondataperformanceandsimilarityanalysis AT yinlongde clusteringofgeneexpressiondataperformanceandsimilarityanalysis AT nijun clusteringofgeneexpressiondataperformanceandsimilarityanalysis
_version_	1724966484644462592

Clustering of gene expression data: performance and similarity analysis

Similar Items