PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.
How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence da...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2010-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC2929187?pdf=render |
id |
doaj-8bcf3bcb816b49af89341541fa0084ba |
---|---|
record_format |
Article |
spelling |
doaj-8bcf3bcb816b49af89341541fa0084ba2020-11-25T02:39:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-01-0158e1188110.1371/journal.pone.0011881PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.Jean-François PessiotHirokazu ChibaHiroto HyakkokuTakeaki TaniguchiWataru FujibuchiHow to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.http://europepmc.org/articles/PMC2929187?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jean-François Pessiot Hirokazu Chiba Hiroto Hyakkoku Takeaki Taniguchi Wataru Fujibuchi |
spellingShingle |
Jean-François Pessiot Hirokazu Chiba Hiroto Hyakkoku Takeaki Taniguchi Wataru Fujibuchi PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. PLoS ONE |
author_facet |
Jean-François Pessiot Hirokazu Chiba Hiroto Hyakkoku Takeaki Taniguchi Wataru Fujibuchi |
author_sort |
Jean-François Pessiot |
title |
PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. |
title_short |
PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. |
title_full |
PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. |
title_fullStr |
PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. |
title_full_unstemmed |
PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. |
title_sort |
peakregressor identifies composite sequence motifs responsible for stat1 binding sites and their potential rsnps. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2010-01-01 |
description |
How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency. |
url |
http://europepmc.org/articles/PMC2929187?pdf=render |
work_keys_str_mv |
AT jeanfrancoispessiot peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps AT hirokazuchiba peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps AT hirotohyakkoku peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps AT takeakitaniguchi peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps AT watarufujibuchi peakregressoridentifiescompositesequencemotifsresponsibleforstat1bindingsitesandtheirpotentialrsnps |
_version_ |
1724784750279786496 |