FlexGP

We describe FlexGP, the first Genetic Programming system to perform symbolic regression on large-scale datasets on the cloud via massive data-parallel ensemble learning. FlexGP provides a decentralized, fault tolerant parallelization framework that runs many copies of Multiple Regression Genetic Pro...

Full description

Bibliographic Details
Main Authors: Veeramachaneni, Kalyan (Contributor), Arnaldo, Ignacio (Contributor), Derby, Owen (Contributor), O'Reilly, Una-May (Contributor)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Laboratory for Information and Decision Systems (Contributor)
Format: Article
Language:English
Published: Springer Netherlands, 2016-07-01T20:33:34Z.
Subjects:
Online Access:Get fulltext
LEADER 01788 am a22002773u 4500
001 103516
042 |a dc 
100 1 0 |a Veeramachaneni, Kalyan  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Laboratory for Information and Decision Systems  |e contributor 
100 1 0 |a Veeramachaneni, Kalyan  |e contributor 
100 1 0 |a Arnaldo, Ignacio  |e contributor 
100 1 0 |a Derby, Owen  |e contributor 
100 1 0 |a O'Reilly, Una-May  |e contributor 
700 1 0 |a Arnaldo, Ignacio  |e author 
700 1 0 |a Derby, Owen  |e author 
700 1 0 |a O'Reilly, Una-May  |e author 
245 0 0 |a FlexGP 
260 |b Springer Netherlands,   |c 2016-07-01T20:33:34Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/103516 
520 |a We describe FlexGP, the first Genetic Programming system to perform symbolic regression on large-scale datasets on the cloud via massive data-parallel ensemble learning. FlexGP provides a decentralized, fault tolerant parallelization framework that runs many copies of Multiple Regression Genetic Programming, a sophisticated symbolic regression algorithm, on the cloud. Each copy executes with a different sample of the data and different parameters. The framework can create a fused model or ensemble on demand as the individual GP learners are evolving. We demonstrate our framework by deploying 100 independent GP instances in a massive data-parallel manner to learn from a dataset composed of 515K exemplars and 90 features, and by generating a competitive fused model in less than 10 minutes. 
520 |a Li Ka Shing Foundation 
520 |a GE Global Research Center 
546 |a en 
655 7 |a Article 
773 |t Journal of Grid Computing