Model based approaches to array CGH data analysis

DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative gen...

Full description

Bibliographic Details
Main Author: Shah, Sohrab P.
Format: Others
Language:English
Published: University of British Columbia 2008
Subjects:
HMM
Online Access:http://hdl.handle.net/2429/2808
id ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.-2808
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.-28082013-06-05T04:16:52ZModel based approaches to array CGH data analysisShah, Sohrab P.Array CGHHMMDNA copy numberDNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups.University of British Columbia2008-11-24T17:37:28Z2008-11-24T17:37:28Z20082008-11-24T17:37:28Z2009-05Electronic Thesis or Dissertation15032556 bytesapplication/pdfhttp://hdl.handle.net/2429/2808eng
collection NDLTD
language English
format Others
sources NDLTD
topic Array CGH
HMM
DNA copy number
spellingShingle Array CGH
HMM
DNA copy number
Shah, Sohrab P.
Model based approaches to array CGH data analysis
description DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups.
author Shah, Sohrab P.
author_facet Shah, Sohrab P.
author_sort Shah, Sohrab P.
title Model based approaches to array CGH data analysis
title_short Model based approaches to array CGH data analysis
title_full Model based approaches to array CGH data analysis
title_fullStr Model based approaches to array CGH data analysis
title_full_unstemmed Model based approaches to array CGH data analysis
title_sort model based approaches to array cgh data analysis
publisher University of British Columbia
publishDate 2008
url http://hdl.handle.net/2429/2808
work_keys_str_mv AT shahsohrabp modelbasedapproachestoarraycghdataanalysis
_version_ 1716586875025620992