Clustering Multivariate Time Series Using Hidden Markov Models

In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categoric...

Full description

Bibliographic Details
Main Authors:	Shima Ghassempour, Federico Girosi, Anthony Maeder
Format:	Article
Language:	English
Published:	MDPI AG 2014-03-01
Series:	International Journal of Environmental Research and Public Health
Subjects:	health trajectory HMM clustering
Online Access:	http://www.mdpi.com/1660-4601/11/3/2741

id	doaj-2e8bbe64199f4a15b5ce00854fecfdb3
record_format	Article
spelling	doaj-2e8bbe64199f4a15b5ce00854fecfdb32020-11-25T00:18:54ZengMDPI AGInternational Journal of Environmental Research and Public Health1660-46012014-03-011132741276310.3390/ijerph110302741ijerph110302741Clustering Multivariate Time Series Using Hidden Markov ModelsShima Ghassempour0Federico Girosi1Anthony Maeder2School of Computing, Engineering and Mathematics, University of Western Sydney, Campbelltown, NSW 2751 , AustraliaCentre for Health Research, University of Western Sydney, Campbelltown, NSW 2751 , AustraliaSchool of Computing, Engineering and Mathematics, University of Western Sydney, Campbelltown, NSW 2751 , AustraliaIn this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.http://www.mdpi.com/1660-4601/11/3/2741health trajectoryHMMclustering
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shima Ghassempour Federico Girosi Anthony Maeder
spellingShingle	Shima Ghassempour Federico Girosi Anthony Maeder Clustering Multivariate Time Series Using Hidden Markov Models International Journal of Environmental Research and Public Health health trajectory HMM clustering
author_facet	Shima Ghassempour Federico Girosi Anthony Maeder
author_sort	Shima Ghassempour
title	Clustering Multivariate Time Series Using Hidden Markov Models
title_short	Clustering Multivariate Time Series Using Hidden Markov Models
title_full	Clustering Multivariate Time Series Using Hidden Markov Models
title_fullStr	Clustering Multivariate Time Series Using Hidden Markov Models
title_full_unstemmed	Clustering Multivariate Time Series Using Hidden Markov Models
title_sort	clustering multivariate time series using hidden markov models
publisher	MDPI AG
series	International Journal of Environmental Research and Public Health
issn	1660-4601
publishDate	2014-03-01
description	In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.
topic	health trajectory HMM clustering
url	http://www.mdpi.com/1660-4601/11/3/2741
work_keys_str_mv	AT shimaghassempour clusteringmultivariatetimeseriesusinghiddenmarkovmodels AT federicogirosi clusteringmultivariatetimeseriesusinghiddenmarkovmodels AT anthonymaeder clusteringmultivariatetimeseriesusinghiddenmarkovmodels
_version_	1725374411659280384

Clustering Multivariate Time Series Using Hidden Markov Models

Similar Items