Construction of a comprehensive web server for transcript expression profiling using cancer RNA-Seq data

碩士 === 國立中興大學 === 基因體暨生物資訊學研究所 === 103 === Molecular understanding of carcinogenesis is key to know cancer mechanisms and facilitate personalized medicine. RNA Sequencing (RNA-Seq), a fast development and applications of next-generation sequencing technology in recent years, which has promoted genet...

Full description

Bibliographic Details
Main Authors: Jian-Rong Li, 李建融
Other Authors: Jim Chun-Chi Liu
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/30982105046545095797
Description
Summary:碩士 === 國立中興大學 === 基因體暨生物資訊學研究所 === 103 === Molecular understanding of carcinogenesis is key to know cancer mechanisms and facilitate personalized medicine. RNA Sequencing (RNA-Seq), a fast development and applications of next-generation sequencing technology in recent years, which has promoted genetic research and been used to several cancer research to provide a revolutionary tool to study alternative splicing and quantify gene/isoform expression levels. However, current some network databases provide download NGS data, therefore researchers have to download and analysis the data manually to extract the appropriate information. To fill this gap, we construct the Cancer RNA-Seq Nexus (CRN) database, the first public database providing phenotype-specific coding-transcript/lncRNA expression profiles in cancer cells. We systematically collected RNA-seq datasets from The Cancer Genome Atlas (TCGA) and NCBI Gene Expression Omnibus (GEO). It resulted in 43 cancer RNA-seq datasets including 242 subsets and 9,199 samples. Each dataset has several phenotype-specific subsets, and each subset contained a group of RNA-seq samples with specific phenotypic traits or cancer conditions, e.g. disease state, cell line, cell type, tissue, genotype. To identify phenotype-specific differentially expressed transcripts (DETs) in each dataset, we selected the subsets with at least 3 samples, and then performed t-test between two subsets without overlap samples. To obtain the expression profiles for both coding transcripts and lncRNAs, we align the RNA-seq reads to the Human transcriptome (GENCODE release 21) included 93,139 protein-coding and 26,414 lncRNA transcript sequences. Web interface: When users select a cancer name or a cancer subset, the associated subset pairs are subsequently listed in the subset-pair panel. When users select a subset pair, the web server shows the detailed description of dataset/subsets and the expression profiles of differentially expressed (DE) protein-coding transcripts and DE lncRNAs. The search panel provides the auto-complete function that quickly searches/selects the partially matched terms. Case study: to demonstrate the biological importance of CRN, we used TP63 gene as an example. The upstream promoter of the gene generates the TAp63 isoforms containing N-terminal transactivation (TA) domain, while an alternative internal promoter leads to the structure of the ΔNp63 isoforms lacking the TA domain. ΔNp63 isoforms suggested to be highly specific for squamous cells and overexpress in squamous cell carcinoma (SCC), while TAp63 expression is lost or extremely low in squamous cells and SCC. In CRN, a significant overexpression of ΔNp63 isoforms was observed in the lung SCC, while was not overexpressed and non-significant different between cancer and normal subsets in lung adenocarcinoma. CRN is freely available at http://syslab4.nchu.edu.tw/CRN.