Hidden Markov Model inference copy number change in array-CGH data

Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. === Includes bibliographical references (p. 56-57). === Cancer development and progression typically features genomic instability frequently resulting in genomic changes involving DNA copy number gains or losses. Identifyin...

Full description

Bibliographic Details
Main Author: Zhang, Yunyu
Other Authors: Lynda Chin, Cheng Li and Cameron W. Brennan.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2006
Subjects:
Online Access:http://hdl.handle.net/1721.1/33086
Description
Summary:Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. === Includes bibliographical references (p. 56-57). === Cancer development and progression typically features genomic instability frequently resulting in genomic changes involving DNA copy number gains or losses. Identifying the genomic location of these regional alterations provides important opportunities for the discovery of potential novel oncogenes and tumor suppressors. Recently, array based competitive genomic hybridization (array-CGH) has become available as a powerful approach for genome-wide detection of DNA copy number changes. Array-CGH assesses DNA copy number in tumor samples through competitive hybridization on microarrays containing probes for thousands of genes. The datasets generated are complex and require statistical methods to accurately define discrete and uniform copy number from the data and to identify transitions between genomic regions with altered copy number. Several approaches based on different statistical frameworks have been developed. However, a fundamental informatic issue in array-CGH analysis remains unsolved by these methods. In particular, sample-specific data compression, a result of tumor cells being commonly admixed with normal cells in many tumor types, must be accounted for in each sample analyzed. Additionally, in order to accurately assess deviations from normal copy number, the copy number readout must be shifted to faithfully represent the baseline copy number in each tumor sample. Failure to appropriately address these issues reduces the accuracy of the data in hard-threshold based high-level analysis. === (cont.) By using the natural framework Hidden Markov Models (HMM) to model the distribution of array-CGH signals, a method infer the absolute copy number and identify change points has been developed to address the above problems. This method has been validated on independent dataset and its utility in inference on array-CGH data is demonstrated here. === by Yunyu Zhang. === S.M.