Model based approaches to array CGH data analysis
DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative gen...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
University of British Columbia
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/2429/2808 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.-2808 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.-28082013-06-05T04:16:52ZModel based approaches to array CGH data analysisShah, Sohrab P.Array CGHHMMDNA copy numberDNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups.University of British Columbia2008-11-24T17:37:28Z2008-11-24T17:37:28Z20082008-11-24T17:37:28Z2009-05Electronic Thesis or Dissertation15032556 bytesapplication/pdfhttp://hdl.handle.net/2429/2808eng |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Array CGH HMM DNA copy number |
spellingShingle |
Array CGH HMM DNA copy number Shah, Sohrab P. Model based approaches to array CGH data analysis |
description |
DNA copy number alterations (CNAs) are genetic changes that can produce
adverse effects in numerous human diseases, including cancer. CNAs are
segments of DNA that have been deleted or amplified and can range in size
from one kilobases to whole chromosome arms. Development of array
comparative genomic hybridization (aCGH) technology enables CNAs to be
measured at sub-megabase resolution using tens of thousands of probes.
However, aCGH data are noisy and result in continuous valued measurements of
the discrete CNAs. Consequently, the data must be processed through
algorithmic and statistical techniques in order to derive meaningful
biological insights. We introduce model-based approaches to analysis of aCGH
data and develop state-of-the-art solutions to three distinct analytical
problems.
In the simplest scenario, the task is to infer CNAs from a single aCGH
experiment. We apply a hidden Markov model (HMM) to accurately identify
CNAs from aCGH data. We show that borrowing statistical strength across
chromosomes and explicitly modeling outliers in the data, improves on
baseline models.
In the second scenario, we wish to identify recurrent CNAs in a set of aCGH
data derived from a patient cohort. These are locations in the genome
altered in many patients, providing evidence for CNAs that may be playing
important molecular roles in the disease. We develop a novel hierarchical
HMM profiling method that explicitly models both statistical and biological
noise in the data and is capable of producing a representative profile for a
set of aCGH experiments. We demonstrate that our method is more accurate
than simpler baselines on synthetic data, and show our model produces output
that is more interpretable than other methods.
Finally, we develop a model based clustering framework to stratify a patient
cohort, expected to be composed of a fixed set of molecular subtypes. We
introduce a model that jointly infers CNAs, assigns patients to subgroups
and infers the profiles that represent each subgroup. We show our model to
be more accurate on synthetic data, and show in two patient cohorts how the
model discovers putative novel subtypes and clinically relevant subgroups. |
author |
Shah, Sohrab P. |
author_facet |
Shah, Sohrab P. |
author_sort |
Shah, Sohrab P. |
title |
Model based approaches to array CGH data analysis |
title_short |
Model based approaches to array CGH data analysis |
title_full |
Model based approaches to array CGH data analysis |
title_fullStr |
Model based approaches to array CGH data analysis |
title_full_unstemmed |
Model based approaches to array CGH data analysis |
title_sort |
model based approaches to array cgh data analysis |
publisher |
University of British Columbia |
publishDate |
2008 |
url |
http://hdl.handle.net/2429/2808 |
work_keys_str_mv |
AT shahsohrabp modelbasedapproachestoarraycghdataanalysis |
_version_ |
1716586875025620992 |