Incorporating chromatin interaction data to improve prediction accuracy of gene expression

Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes...

Full description

Bibliographic Details
Main Author: Li, Xue
Other Authors: Zheyang Wu, Advisor
Format: Others
Published: Digital WPI 2015
Subjects:
Online Access:https://digitalcommons.wpi.edu/etd-theses/589
https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses
id ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-1588
record_format oai_dc
spelling ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-15882019-03-22T05:50:08Z Incorporating chromatin interaction data to improve prediction accuracy of gene expression Li, Xue Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes, and involve such information into the Linear Mixed-Effects (LME) model to improve the accuracy of gene expression prediction. In particular, we use chromatin features as predictors and each gene is an observation. Before model training and testing, genes are grouped according to the genome structural information. We use four gene grouping methods: 1) grouping genes according to sliding windows on primary structure; 2) grouping anchor genes in chromatin loop structure; 3) grouping genes in the CTCF-anchored domain; and 4) grouping genes in the chromatin domains obtained from Hi-C experiments. We compare the prediction accuracy between LME model and linear regression model. If all chromatin feature predictors are included into the models, based on the primary structure only (Method 1), the LME models improve prediction accuracy by up to 1%. Based on the tertiary structure only (Methods 2-4), for the genes that can be grouped according the tertiary interaction data, LME models improve prediction accuracy by up to 2.1%. For individual chromatin feature predictors, the LME models improve from 2% to 26 %, in which improvement is more significant for chromatin features that have lower original predictive ability. For future research we propose a model that combines the primary and tertiary structure to infer the correlations among genes to further improve the prediction. 2015-04-30T07:00:00Z text application/pdf https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses Masters Theses (All Theses, All Years) Digital WPI Zheyang Wu, Advisor Dmitry Korkin, Reader genome structure linear mixed effects model gene expression prediction
collection NDLTD
format Others
sources NDLTD
topic genome structure
linear mixed effects model
gene expression prediction
spellingShingle genome structure
linear mixed effects model
gene expression prediction
Li, Xue
Incorporating chromatin interaction data to improve prediction accuracy of gene expression
description Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes, and involve such information into the Linear Mixed-Effects (LME) model to improve the accuracy of gene expression prediction. In particular, we use chromatin features as predictors and each gene is an observation. Before model training and testing, genes are grouped according to the genome structural information. We use four gene grouping methods: 1) grouping genes according to sliding windows on primary structure; 2) grouping anchor genes in chromatin loop structure; 3) grouping genes in the CTCF-anchored domain; and 4) grouping genes in the chromatin domains obtained from Hi-C experiments. We compare the prediction accuracy between LME model and linear regression model. If all chromatin feature predictors are included into the models, based on the primary structure only (Method 1), the LME models improve prediction accuracy by up to 1%. Based on the tertiary structure only (Methods 2-4), for the genes that can be grouped according the tertiary interaction data, LME models improve prediction accuracy by up to 2.1%. For individual chromatin feature predictors, the LME models improve from 2% to 26 %, in which improvement is more significant for chromatin features that have lower original predictive ability. For future research we propose a model that combines the primary and tertiary structure to infer the correlations among genes to further improve the prediction.
author2 Zheyang Wu, Advisor
author_facet Zheyang Wu, Advisor
Li, Xue
author Li, Xue
author_sort Li, Xue
title Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_short Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_full Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_fullStr Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_full_unstemmed Incorporating chromatin interaction data to improve prediction accuracy of gene expression
title_sort incorporating chromatin interaction data to improve prediction accuracy of gene expression
publisher Digital WPI
publishDate 2015
url https://digitalcommons.wpi.edu/etd-theses/589
https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses
work_keys_str_mv AT lixue incorporatingchromatininteractiondatatoimprovepredictionaccuracyofgeneexpression
_version_ 1719006246128844800