Computational Modeling of Cancer Progression
Cancer is a multi-stage process resulting from accumulation of genetic mutations. Data obtained from assaying a tumor only contains the set of mutations in the tumor and lacks information about their temporal order. Learning the chronological order of the genetic mutations is an important step towar...
Main Author: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
KTH, Beräkningsbiologi, CB
2013
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-121597 http://nbn-resolving.de/urn:isbn:978-91-7501-724-2 |
Summary: | Cancer is a multi-stage process resulting from accumulation of genetic mutations. Data obtained from assaying a tumor only contains the set of mutations in the tumor and lacks information about their temporal order. Learning the chronological order of the genetic mutations is an important step towards understanding the disease. The probability of introduction of a mutation to a tumor increases if certain mutations that promote it, already happened. Such dependencies induce what we call the monotonicity property in cancer progression. A realistic model of cancer progression should take this property into account. In this thesis, we present two models for cancer progression and algorithms for learning them. In the first model, we propose Progression Networks (PNs), which are a special class of Bayesian networks. In learning PNs the issue of monotonicity is taken into consideration. The problem of learning PNs is reduced to Mixed Integer Linear Programming (MILP), which is a NP-hard problem for which very good heuristics exist. We also developed a program, DiProg, for learning PNs. In the second model, the problem of noise in the biological experiments is addressed by introducing hidden variable. We call this model Hidden variable Oncogenetic Network (HON). In a HON, there are two variables assigned to each node, a hidden variable that represents the progression of cancer to the node and an observable random variable that represents the observation of the mutation corresponding to the node. We devised a structural Expectation Maximization (EM) algorithm for learning HONs. In the M-step of the structural EM algorithm, we need to perform a considerable number of inference tasks. Because exact inference is tractable only on Bayesian networks with bounded treewidth, we also developed an algorithm for learning bounded treewidth Bayesian networks by reducing the problem to a MILP. Our algorithms performed well on synthetic data. We also tested them on cytogenetic data from renal cell carcinoma. The learned progression networks from both algorithms are in agreement with the previously published results. MicroRNAs are short non-coding RNAs that are involved in post transcriptional regulation. A-to-I editing of microRNAs converts adenosine to inosine in the double stranded RNA. We developed a method for determining editing levels in mature microRNAs from the high-throughput RNA sequencing data from the mouse brain. Here, for the first time, we showed that the level of editing increases with development. === <p>QC 20130503</p> |
---|