ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.

Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-leve...

Full description

Bibliographic Details
Main Authors:	Halfdan Rydbeck, Geir Kjetil Sandve, Egil Ferkingstad, Boris Simovski, Morten Rye, Eivind Hovig
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2015-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4400084?pdf=render

id	doaj-15e4edb636e845dc932aee98117d44d7
record_format	Article
spelling	doaj-15e4edb636e845dc932aee98117d44d72020-11-24T21:36:43ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01104e012326110.1371/journal.pone.0123261ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.Halfdan RydbeckGeir Kjetil SandveEgil FerkingstadBoris SimovskiMorten RyeEivind HovigClustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.http://europepmc.org/articles/PMC4400084?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Halfdan Rydbeck Geir Kjetil Sandve Egil Ferkingstad Boris Simovski Morten Rye Eivind Hovig
spellingShingle	Halfdan Rydbeck Geir Kjetil Sandve Egil Ferkingstad Boris Simovski Morten Rye Eivind Hovig ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets. PLoS ONE
author_facet	Halfdan Rydbeck Geir Kjetil Sandve Egil Ferkingstad Boris Simovski Morten Rye Eivind Hovig
author_sort	Halfdan Rydbeck
title	ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.
title_short	ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.
title_full	ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.
title_fullStr	ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.
title_full_unstemmed	ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.
title_sort	clustrack: feature extraction and similarity measures for clustering of genome-wide data sets.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2015-01-01
description	Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.
url	http://europepmc.org/articles/PMC4400084?pdf=render
work_keys_str_mv	AT halfdanrydbeck clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT geirkjetilsandve clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT egilferkingstad clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT borissimovski clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT mortenrye clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT eivindhovig clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets
_version_	1725939791577481216

ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.

Similar Items