PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.

How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence da...

Full description

Bibliographic Details
Main Authors: Jean-François Pessiot, Hirokazu Chiba, Hiroto Hyakkoku, Takeaki Taniguchi, Wataru Fujibuchi
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2929187?pdf=render
id doaj-8bcf3bcb816b49af89341541fa0084ba
record_format Article
spelling doaj-8bcf3bcb816b49af89341541fa0084ba2020-11-25T02:39:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-01-0158e1188110.1371/journal.pone.0011881PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.Jean-François PessiotHirokazu ChibaHiroto HyakkokuTakeaki TaniguchiWataru FujibuchiHow to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.http://europepmc.org/articles/PMC2929187?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Jean-François Pessiot
Hirokazu Chiba
Hiroto Hyakkoku
Takeaki Taniguchi
Wataru Fujibuchi
spellingShingle Jean-François Pessiot
Hirokazu Chiba
Hiroto Hyakkoku
Takeaki Taniguchi
Wataru Fujibuchi
PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
PLoS ONE
author_facet Jean-François Pessiot
Hirokazu Chiba
Hiroto Hyakkoku
Takeaki Taniguchi
Wataru Fujibuchi
author_sort Jean-François Pessiot
title PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
title_short PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
title_full PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
title_fullStr PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
title_full_unstemmed PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
title_sort peakregressor identifies composite sequence motifs responsible for stat1 binding sites and their potential rsnps.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2010-01-01
description How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.
url http://europepmc.org/articles/PMC2929187?pdf=render
work_keys_str_mv AT jeanfrancoispessiot peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps
AT hirokazuchiba peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps
AT hirotohyakkoku peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps
AT takeakitaniguchi peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps
AT watarufujibuchi peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps
_version_ 1724784750279786496