An improved systematic approach to predicting transcription factor target genes using support vector machine.

Biological prediction of transcription factor binding sites and their corresponding transcription factor target genes (TFTGs) makes great contribution to understanding the gene regulatory networks. However, these approaches are based on laborious and time-consuming biological experiments. Numerous c...

Full description

Bibliographic Details
Main Authors: Song Cui, Eunseog Youn, Joohyun Lee, Stephan J Maas
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3990533?pdf=render
id doaj-8eb9d1c18f3c46af8f20d0f7f170cec0
record_format Article
spelling doaj-8eb9d1c18f3c46af8f20d0f7f170cec02020-11-25T01:52:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0194e9451910.1371/journal.pone.0094519An improved systematic approach to predicting transcription factor target genes using support vector machine.Song CuiEunseog YounJoohyun LeeStephan J MaasBiological prediction of transcription factor binding sites and their corresponding transcription factor target genes (TFTGs) makes great contribution to understanding the gene regulatory networks. However, these approaches are based on laborious and time-consuming biological experiments. Numerous computational approaches have shown great potential to circumvent laborious biological methods. However, the majority of these algorithms provide limited performances and fail to consider the structural property of the datasets. We proposed a refined systematic computational approach for predicting TFTGs. Based on previous work done on identifying auxin response factor target genes from Arabidopsis thaliana co-expression data, we adopted a novel reverse-complementary distance-sensitive n-gram profile algorithm. This algorithm converts each upstream sub-sequence into a high-dimensional vector data point and transforms the prediction task into a classification problem using support vector machine-based classifier. Our approach showed significant improvement compared to other computational methods based on the area under curve value of the receiver operating characteristic curve using 10-fold cross validation. In addition, in the light of the highly skewed structure of the dataset, we also evaluated other metrics and their associated curves, such as precision-recall curves and cost curves, which provided highly satisfactory results.http://europepmc.org/articles/PMC3990533?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Song Cui
Eunseog Youn
Joohyun Lee
Stephan J Maas
spellingShingle Song Cui
Eunseog Youn
Joohyun Lee
Stephan J Maas
An improved systematic approach to predicting transcription factor target genes using support vector machine.
PLoS ONE
author_facet Song Cui
Eunseog Youn
Joohyun Lee
Stephan J Maas
author_sort Song Cui
title An improved systematic approach to predicting transcription factor target genes using support vector machine.
title_short An improved systematic approach to predicting transcription factor target genes using support vector machine.
title_full An improved systematic approach to predicting transcription factor target genes using support vector machine.
title_fullStr An improved systematic approach to predicting transcription factor target genes using support vector machine.
title_full_unstemmed An improved systematic approach to predicting transcription factor target genes using support vector machine.
title_sort improved systematic approach to predicting transcription factor target genes using support vector machine.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description Biological prediction of transcription factor binding sites and their corresponding transcription factor target genes (TFTGs) makes great contribution to understanding the gene regulatory networks. However, these approaches are based on laborious and time-consuming biological experiments. Numerous computational approaches have shown great potential to circumvent laborious biological methods. However, the majority of these algorithms provide limited performances and fail to consider the structural property of the datasets. We proposed a refined systematic computational approach for predicting TFTGs. Based on previous work done on identifying auxin response factor target genes from Arabidopsis thaliana co-expression data, we adopted a novel reverse-complementary distance-sensitive n-gram profile algorithm. This algorithm converts each upstream sub-sequence into a high-dimensional vector data point and transforms the prediction task into a classification problem using support vector machine-based classifier. Our approach showed significant improvement compared to other computational methods based on the area under curve value of the receiver operating characteristic curve using 10-fold cross validation. In addition, in the light of the highly skewed structure of the dataset, we also evaluated other metrics and their associated curves, such as precision-recall curves and cost curves, which provided highly satisfactory results.
url http://europepmc.org/articles/PMC3990533?pdf=render
work_keys_str_mv AT songcui animprovedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT eunseogyoun animprovedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT joohyunlee animprovedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT stephanjmaas animprovedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT songcui improvedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT eunseogyoun improvedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT joohyunlee improvedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
AT stephanjmaas improvedsystematicapproachtopredictingtranscriptionfactortargetgenesusingsupportvectormachine
_version_ 1724993240706318336