Gene fusion discovery through RNA-seq and inversion detection via optical mapping

RNA-seq sequencing has revolutionized the landscape of whole transcriptome sequencing and analysis. With its capacity of sequencing in a high-throughput and low-cost way, it produced ever increasingly amount of RNA-seq reads that are mines of treasure in biological and therapeutic studies. However,...

Full description

Bibliographic Details
Main Authors: Wu, Jikun, 武继坤
Other Authors: Lam, TW
Language:English
Published: The University of Hong Kong (Pokfulam, Hong Kong) 2014
Subjects:
Online Access:http://hdl.handle.net/10722/195960
Description
Summary:RNA-seq sequencing has revolutionized the landscape of whole transcriptome sequencing and analysis. With its capacity of sequencing in a high-throughput and low-cost way, it produced ever increasingly amount of RNA-seq reads that are mines of treasure in biological and therapeutic studies. However, due to the complex nature and relatively un-developed knowledge base of transcription process, many challenges exist in the modeling and investigation of RNA-seq read data. It is of high importance to develop efficient computational tools to satisfy these needs. The first part of this thesis concentrates on algorithms for both upstream and downstream analysis of RNA-seq data. For the upstream, we aim to tackle down the problems of RNA-seq reads alignment where the segmental alignment causes the major difficulty. By employing a strategy of rigid extensive tries on read segmentations indices, we implemented an accurate algorithm for returning two-segmental alignments based on bi-directional BWT. For the downstream analysis, we study two types of gene fusion events which play a critical role in the formation of cancers. Unlike previous down-scoping-search methods, we applied a search-validate approach to design the framework. By introducing key techniques such as masking, two-segmental alignment and retention of multiple maps, we developed an efficient and robust tool for detecting gene fusions with high accuracy that proved by extensive simulation and real data tests. Optical mapping is a cutting edge technique for the study of genomic structural variations which address the defect and limitation of paired-end sequencing. It was designed with great improvement in accuracy, resolution and throughput than current techniques. Also, it produces much longer molecules which enables us to explore genomic regions rich in repetitive sequences. Optical mapping has the potential to enable us to draw a complete picture of the genome structure polymorphism and it is important for us to design tools for analysis of the data. The second part of the thesis is dedicated to the algorithms for both upstream and downstream analysis of optical map data. For the upstream, we formulated a robust scoring function, which combines the effectiveness of heuristic functions and the accuracy of statistical functions. Based on it, we implemented the high performance OMDP algorithm. For the downstream, we developed BP-OMDP which makes use of both split-mapping and disparity of coverage depth to call inversions in NA12878 human genome sample. === published_or_final_version === Computer science === Doctoral === Doctor of Philosophy