Model-based clustering for aCGH data using variational EM
DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients ac...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
University of British Columbia
2009
|
Online Access: | http://hdl.handle.net/2429/11992 |
id |
ndltd-UBC-oai-circle.library.ubc.ca-2429-11992 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UBC-oai-circle.library.ubc.ca-2429-119922018-01-05T17:23:38Z Model-based clustering for aCGH data using variational EM Alain, Guillaume DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data. Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can be trained with variational methods to achieve better results and make it more flexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets. Science, Faculty of Computer Science, Department of Graduate 2009-08-11T15:29:17Z 2009-08-11T15:29:17Z 2009 2009-11 Text Thesis/Dissertation http://hdl.handle.net/2429/11992 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ 2502822 bytes application/pdf University of British Columbia |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
description |
DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data.
Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can
be trained with variational methods to achieve better results and make it more flexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated
from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets. === Science, Faculty of === Computer Science, Department of === Graduate |
author |
Alain, Guillaume |
spellingShingle |
Alain, Guillaume Model-based clustering for aCGH data using variational EM |
author_facet |
Alain, Guillaume |
author_sort |
Alain, Guillaume |
title |
Model-based clustering for aCGH data using variational EM |
title_short |
Model-based clustering for aCGH data using variational EM |
title_full |
Model-based clustering for aCGH data using variational EM |
title_fullStr |
Model-based clustering for aCGH data using variational EM |
title_full_unstemmed |
Model-based clustering for aCGH data using variational EM |
title_sort |
model-based clustering for acgh data using variational em |
publisher |
University of British Columbia |
publishDate |
2009 |
url |
http://hdl.handle.net/2429/11992 |
work_keys_str_mv |
AT alainguillaume modelbasedclusteringforacghdatausingvariationalem |
_version_ |
1718582120301985792 |