Model-based clustering for aCGH data using variational EM

DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients ac...

Full description

Bibliographic Details
Main Author: Alain, Guillaume
Format: Others
Language:English
Published: University of British Columbia 2009
Online Access:http://hdl.handle.net/2429/11992
id ndltd-UBC-oai-circle.library.ubc.ca-2429-11992
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-119922018-01-05T17:23:38Z Model-based clustering for aCGH data using variational EM Alain, Guillaume DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data. Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can be trained with variational methods to achieve better results and make it more flexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets. Science, Faculty of Computer Science, Department of Graduate 2009-08-11T15:29:17Z 2009-08-11T15:29:17Z 2009 2009-11 Text Thesis/Dissertation http://hdl.handle.net/2429/11992 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ 2502822 bytes application/pdf University of British Columbia
collection NDLTD
language English
format Others
sources NDLTD
description DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data. Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can be trained with variational methods to achieve better results and make it more flexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets. === Science, Faculty of === Computer Science, Department of === Graduate
author Alain, Guillaume
spellingShingle Alain, Guillaume
Model-based clustering for aCGH data using variational EM
author_facet Alain, Guillaume
author_sort Alain, Guillaume
title Model-based clustering for aCGH data using variational EM
title_short Model-based clustering for aCGH data using variational EM
title_full Model-based clustering for aCGH data using variational EM
title_fullStr Model-based clustering for aCGH data using variational EM
title_full_unstemmed Model-based clustering for aCGH data using variational EM
title_sort model-based clustering for acgh data using variational em
publisher University of British Columbia
publishDate 2009
url http://hdl.handle.net/2429/11992
work_keys_str_mv AT alainguillaume modelbasedclusteringforacghdatausingvariationalem
_version_ 1718582120301985792