Detecting Novel Associations in Large Data Sets

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and fo...

Full description

Bibliographic Details
Main Authors: Reshef, David N. (Contributor), Reshef, Yakir (Contributor), Grossman, Sharon Rachel (Contributor), Finucane, Hilary Kiyo (Author), McVean, Gilean (Author), Turnbaugh, Peter J. (Author), Mitzenmacher, Michael (Author), Sabeti, Pardis C. (Author), Lander, Eric Steven (Author)
Other Authors: Whitaker College of Health Sciences and Technology (Contributor), Massachusetts Institute of Technology. Department of Biology (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor), Lander, Eric S. (Contributor)
Format: Article
Language:English
Published: American Association for the Advancement of Science (AAAS), 2014-02-03T13:18:52Z.
Subjects:
Online Access:Get fulltext
LEADER 02137 am a22003373u 4500
001 84636
042 |a dc 
100 1 0 |a Reshef, David N.  |e author 
100 1 0 |a Whitaker College of Health Sciences and Technology  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Biology  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
100 1 0 |a Reshef, David N.  |e contributor 
100 1 0 |a Reshef, Yakir  |e contributor 
100 1 0 |a Grossman, Sharon Rachel  |e contributor 
100 1 0 |a Lander, Eric S.  |e contributor 
700 1 0 |a Reshef, Yakir  |e author 
700 1 0 |a Grossman, Sharon Rachel  |e author 
700 1 0 |a Finucane, Hilary Kiyo  |e author 
700 1 0 |a McVean, Gilean  |e author 
700 1 0 |a Turnbaugh, Peter J.  |e author 
700 1 0 |a Mitzenmacher, Michael  |e author 
700 1 0 |a Sabeti, Pardis C.  |e author 
700 1 0 |a Lander, Eric Steven  |e author 
245 0 0 |a Detecting Novel Associations in Large Data Sets 
260 |b American Association for the Advancement of Science (AAAS),   |c 2014-02-03T13:18:52Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/84636 
520 |a Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R[superscript 2]) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. 
520 |a National Institute of General Medical Sciences (U.S.) (Medical Scientist Training Program) 
546 |a en_US 
655 7 |a Article 
773 |t Science