Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments

Abstract Background Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression...

Full description

Bibliographic Details
Main Authors:	Morten Muhlig Nielsen, Paula Tataru, Tobias Madsen, Asger Hobolth, Jakob Skou Pedersen
Format:	Article
Language:	English
Published:	BMC 2018-12-01
Series:	Algorithms for Molecular Biology
Online Access:	http://link.springer.com/article/10.1186/s13015-018-0135-2

id	doaj-61a03bdbb36e4f6dbd5ccac65365961b
record_format	Article
spelling	doaj-61a03bdbb36e4f6dbd5ccac65365961b2020-11-25T01:23:40ZengBMCAlgorithms for Molecular Biology1748-71882018-12-0113111110.1186/s13015-018-0135-2Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experimentsMorten Muhlig Nielsen0Paula Tataru1Tobias Madsen2Asger Hobolth3Jakob Skou Pedersen4Department of Molecular Medicine (MOMA), Aarhus University HospitalBioinformatics Research Centre, Aarhus UniversityDepartment of Molecular Medicine (MOMA), Aarhus University HospitalBioinformatics Research Centre, Aarhus UniversityDepartment of Molecular Medicine (MOMA), Aarhus University HospitalAbstract Background Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool. Methods We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems. Results We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity. Conclusions Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).http://link.springer.com/article/10.1186/s13015-018-0135-2
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Morten Muhlig Nielsen Paula Tataru Tobias Madsen Asger Hobolth Jakob Skou Pedersen
spellingShingle	Morten Muhlig Nielsen Paula Tataru Tobias Madsen Asger Hobolth Jakob Skou Pedersen Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments Algorithms for Molecular Biology
author_facet	Morten Muhlig Nielsen Paula Tataru Tobias Madsen Asger Hobolth Jakob Skou Pedersen
author_sort	Morten Muhlig Nielsen
title	Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_short	Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_full	Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_fullStr	Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_full_unstemmed	Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_sort	regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
publisher	BMC
series	Algorithms for Molecular Biology
issn	1748-7188
publishDate	2018-12-01
description	Abstract Background Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool. Methods We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems. Results We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity. Conclusions Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).
url	http://link.springer.com/article/10.1186/s13015-018-0135-2
work_keys_str_mv	AT mortenmuhlignielsen regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments AT paulatataru regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments AT tobiasmadsen regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments AT asgerhobolth regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments AT jakobskoupedersen regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments
_version_	1725120759528947712

Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments

Similar Items