Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping

Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which...

Full description

Bibliographic Details
Main Authors:	Puccinelli, Robert (Author), Kim, Ryan (Author), Fordyce, Polly (Author), Orenstein, Yaron (Contributor), Berger Leighton, Bonnie (Contributor)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Mathematics (Contributor)
Format:	Article
Language:	English
Published:	Elsevier, 2018-05-16T13:17:47Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02435 am a22002413u 4500
001	115384
042			\|a dc
100	1	0	\|a Puccinelli, Robert \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Mathematics \|e contributor
100	1	0	\|a Orenstein, Yaron \|e contributor
100	1	0	\|a Berger Leighton, Bonnie \|e contributor
700	1	0	\|a Kim, Ryan \|e author
700	1	0	\|a Fordyce, Polly \|e author
700	1	0	\|a Orenstein, Yaron \|e author
700	1	0	\|a Berger Leighton, Bonnie \|e author
245	0	0	\|a Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping
260			\|b Elsevier, \|c 2018-05-16T13:17:47Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/115384
520			\|a Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost. We present a new compact sequence design that covers all k-mers utilizing joker characters and develop an efficient algorithm to generate such designs. We show through simulations and experimental validation that these sequence designs are useful for identifying high-affinity binding sites at significantly reduced cost and space. Keywords: sequence libraries; microarray design; de Bruijn graph
520			\|a National Institutes of Health (U.S.) (Grant R01GM081871)
655	7		\|a Article
773			\|t Cell Systems

Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping

Similar Items