Incorporating chromatin interaction data to improve prediction accuracy of gene expression
Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Published: |
Digital WPI
2015
|
Subjects: | |
Online Access: | https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses |
id |
ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-1588 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-wpi.edu-oai-digitalcommons.wpi.edu-etd-theses-15882019-03-22T05:50:08Z Incorporating chromatin interaction data to improve prediction accuracy of gene expression Li, Xue Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes, and involve such information into the Linear Mixed-Effects (LME) model to improve the accuracy of gene expression prediction. In particular, we use chromatin features as predictors and each gene is an observation. Before model training and testing, genes are grouped according to the genome structural information. We use four gene grouping methods: 1) grouping genes according to sliding windows on primary structure; 2) grouping anchor genes in chromatin loop structure; 3) grouping genes in the CTCF-anchored domain; and 4) grouping genes in the chromatin domains obtained from Hi-C experiments. We compare the prediction accuracy between LME model and linear regression model. If all chromatin feature predictors are included into the models, based on the primary structure only (Method 1), the LME models improve prediction accuracy by up to 1%. Based on the tertiary structure only (Methods 2-4), for the genes that can be grouped according the tertiary interaction data, LME models improve prediction accuracy by up to 2.1%. For individual chromatin feature predictors, the LME models improve from 2% to 26 %, in which improvement is more significant for chromatin features that have lower original predictive ability. For future research we propose a model that combines the primary and tertiary structure to infer the correlations among genes to further improve the prediction. 2015-04-30T07:00:00Z text application/pdf https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses Masters Theses (All Theses, All Years) Digital WPI Zheyang Wu, Advisor Dmitry Korkin, Reader genome structure linear mixed effects model gene expression prediction |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
genome structure linear mixed effects model gene expression prediction |
spellingShingle |
genome structure linear mixed effects model gene expression prediction Li, Xue Incorporating chromatin interaction data to improve prediction accuracy of gene expression |
description |
Genome structure can be classified into three categories: primary structure, secondary structure and tertiary structure, and they are all important for gene transcription regulation. In this research, we utilize the structural information to characterize the correlations and interactions among genes, and involve such information into the Linear Mixed-Effects (LME) model to improve the accuracy of gene expression prediction. In particular, we use chromatin features as predictors and each gene is an observation. Before model training and testing, genes are grouped according to the genome structural information. We use four gene grouping methods: 1) grouping genes according to sliding windows on primary structure; 2) grouping anchor genes in chromatin loop structure; 3) grouping genes in the CTCF-anchored domain; and 4) grouping genes in the chromatin domains obtained from Hi-C experiments. We compare the prediction accuracy between LME model and linear regression model. If all chromatin feature predictors are included into the models, based on the primary structure only (Method 1), the LME models improve prediction accuracy by up to 1%. Based on the tertiary structure only (Methods 2-4), for the genes that can be grouped according the tertiary interaction data, LME models improve prediction accuracy by up to 2.1%. For individual chromatin feature predictors, the LME models improve from 2% to 26 %, in which improvement is more significant for chromatin features that have lower original predictive ability. For future research we propose a model that combines the primary and tertiary structure to infer the correlations among genes to further improve the prediction. |
author2 |
Zheyang Wu, Advisor |
author_facet |
Zheyang Wu, Advisor Li, Xue |
author |
Li, Xue |
author_sort |
Li, Xue |
title |
Incorporating chromatin interaction data to improve prediction accuracy of gene expression |
title_short |
Incorporating chromatin interaction data to improve prediction accuracy of gene expression |
title_full |
Incorporating chromatin interaction data to improve prediction accuracy of gene expression |
title_fullStr |
Incorporating chromatin interaction data to improve prediction accuracy of gene expression |
title_full_unstemmed |
Incorporating chromatin interaction data to improve prediction accuracy of gene expression |
title_sort |
incorporating chromatin interaction data to improve prediction accuracy of gene expression |
publisher |
Digital WPI |
publishDate |
2015 |
url |
https://digitalcommons.wpi.edu/etd-theses/589 https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=1588&context=etd-theses |
work_keys_str_mv |
AT lixue incorporatingchromatininteractiondatatoimprovepredictionaccuracyofgeneexpression |
_version_ |
1719006246128844800 |