Incorporating chromatin interaction data to improve prediction accuracy of gene expression

Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes...

Full description

Bibliographic Details
Main Author:	Li, Xue
Other Authors:	Zheyang Wu, Advisor
Format:	Others
Published:	Digital WPI 2015
Subjects:	genome structure linear mixed effects model gene expression prediction
Online Access:	https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses

id	ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-1588
record_format	oai_dc
spelling	ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-15882019-03-22T05:50:08Z Incorporating chromatin interaction data to improve prediction accuracy of gene expression Li, Xue Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes, and involve such information into the Linear Mixed-Effects (LME) model to improve the accuracy of gene expression prediction. In particular, we use chromatin features as predictors and each gene is an observation. Before model training and testing, genes are grouped according to the genome structural information. We use four gene grouping methods: 1) grouping genes according to sliding windows on primary structure; 2) grouping anchor genes in chromatin loop structure; 3) grouping genes in the CTCF-anchored domain; and 4) grouping genes in the chromatin domains obtained from Hi-C experiments. We compare the prediction accuracy between LME model and linear regression model. If all chromatin feature predictors are included into the models, based on the primary structure only (Method 1), the LME models improve prediction accuracy by up to 1%. Based on the tertiary structure only (Methods 2-4), for the genes that can be grouped according the tertiary interaction data, LME models improve prediction accuracy by up to 2.1%. For individual chromatin feature predictors, the LME models improve from 2% to 26 %, in which improvement is more significant for chromatin features that have lower original predictive ability. For future research we propose a model that combines the primary and tertiary structure to infer the correlations among genes to further improve the prediction. 2015-04-30T07:00:00Z text application/pdf https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses Masters Theses (All Theses, All Years) Digital WPI Zheyang Wu, Advisor Dmitry Korkin, Reader genome structure linear mixed effects model gene expression prediction
collection	NDLTD
format	Others
sources	NDLTD
topic	genome structure linear mixed effects model gene expression prediction
spellingShingle	genome structure linear mixed effects model gene expression prediction Li, Xue Incorporating chromatin interaction data to improve prediction accuracy of gene expression
description	Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes, and involve such information into the Linear Mixed-Effects (LME) model to improve the accuracy of gene expression prediction. In particular, we use chromatin features as predictors and each gene is an observation. Before model training and testing, genes are grouped according to the genome structural information. We use four gene grouping methods: 1) grouping genes according to sliding windows on primary structure; 2) grouping anchor genes in chromatin loop structure; 3) grouping genes in the CTCF-anchored domain; and 4) grouping genes in the chromatin domains obtained from Hi-C experiments. We compare the prediction accuracy between LME model and linear regression model. If all chromatin feature predictors are included into the models, based on the primary structure only (Method 1), the LME models improve prediction accuracy by up to 1%. Based on the tertiary structure only (Methods 2-4), for the genes that can be grouped according the tertiary interaction data, LME models improve prediction accuracy by up to 2.1%. For individual chromatin feature predictors, the LME models improve from 2% to 26 %, in which improvement is more significant for chromatin features that have lower original predictive ability. For future research we propose a model that combines the primary and tertiary structure to infer the correlations among genes to further improve the prediction.
author2	Zheyang Wu, Advisor
author_facet	Zheyang Wu, Advisor Li, Xue
author	Li, Xue
author_sort	Li, Xue
title	Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_short	Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_full	Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_fullStr	Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_full_unstemmed	Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_sort	incorporating chromatin interaction data to improve prediction accuracy of gene expression
publisher	Digital WPI
publishDate	2015
url	https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses
work_keys_str_mv	AT lixue incorporatingchromatininteractiondatatoimprovepredictionaccuracyofgeneexpression
_version_	1719006246128844800

Incorporating chromatin interaction data to improve prediction accuracy of gene expression

Similar Items