Machine Learning for Variant Detection and Population Analysis in Heterogenerous Cancer Sample

Cancer is a complex and deadly disease that is caused by genetic lesions in somatic cells. Further research in computational methodology for detecting and characterizing somatic mutations is necessary in order to understand the comprehensive systems level model of the roles of those lesions in cance...

Full description

Bibliographic Details
Main Author: Jiao, Wei
Other Authors: Stein, Lincoln
Language:en_ca
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/1807/42971
Description
Summary:Cancer is a complex and deadly disease that is caused by genetic lesions in somatic cells. Further research in computational methodology for detecting and characterizing somatic mutations is necessary in order to understand the comprehensive systems level model of the roles of those lesions in cancer development. In the first project, I trained a list of supervised machine learning classifiers that classify false positive versus true positive somatic single nucleotide variants (SNVs). I was able to show an improvement of somatic SNV detection on the data set over the reported classifier. In the second project, we developed PhyloSub model that uses a nonparametric Bayesian prior over a set of trees to cluster SNVs, and infer the subclonal phylogenetic structure of tumors with uncertainty from SNV sequencing data. Experiments showed that PhyloSub model could infer the subclonal phylogenetic structure from both single and multiple tumor samples.