Fast R Functions for Robust Correlations and Hierarchical Clustering

Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied...

Full description

Bibliographic Details
Main Authors:	Peter Langfelder, Steve Horvath
Format:	Article
Language:	English
Published:	Foundation for Open Access Statistics 2012-01-01
Series:	Journal of Statistical Software
Subjects:	Pearson correlation robust correlation hierarchical clustering R
Online Access:	http://www.jstatsoft.org/v46/i11/paper

id	doaj-ffbd5b8da09a4bf9afb7a8dc48f1f52e
record_format	Article
spelling	doaj-ffbd5b8da09a4bf9afb7a8dc48f1f52e2020-11-25T00:48:24ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602012-01-014611Fast R Functions for Robust Correlations and Hierarchical ClusteringPeter LangfelderSteve HorvathMany high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.The hierarchical clustering algorithm implemented in R function hclustis an order n^3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n^2, leading to substantial time savings when clustering large data sets.http://www.jstatsoft.org/v46/i11/paperPearson correlationrobust correlationhierarchical clusteringR
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Peter Langfelder Steve Horvath
spellingShingle	Peter Langfelder Steve Horvath Fast R Functions for Robust Correlations and Hierarchical Clustering Journal of Statistical Software Pearson correlation robust correlation hierarchical clustering R
author_facet	Peter Langfelder Steve Horvath
author_sort	Peter Langfelder
title	Fast R Functions for Robust Correlations and Hierarchical Clustering
title_short	Fast R Functions for Robust Correlations and Hierarchical Clustering
title_full	Fast R Functions for Robust Correlations and Hierarchical Clustering
title_fullStr	Fast R Functions for Robust Correlations and Hierarchical Clustering
title_full_unstemmed	Fast R Functions for Robust Correlations and Hierarchical Clustering
title_sort	fast r functions for robust correlations and hierarchical clustering
publisher	Foundation for Open Access Statistics
series	Journal of Statistical Software
issn	1548-7660
publishDate	2012-01-01
description	Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.The hierarchical clustering algorithm implemented in R function hclustis an order n^3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n^2, leading to substantial time savings when clustering large data sets.
topic	Pearson correlation robust correlation hierarchical clustering R
url	http://www.jstatsoft.org/v46/i11/paper
work_keys_str_mv	AT peterlangfelder fastrfunctionsforrobustcorrelationsandhierarchicalclustering AT stevehorvath fastrfunctionsforrobustcorrelationsandhierarchicalclustering
_version_	1725256270732066816

Fast R Functions for Robust Correlations and Hierarchical Clustering

Similar Items