Weld: A Common Runtime for High Performance Data Analytics

© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved. Modern analytics applications combine multiple functions from different libraries and frameworks to build increasingly complex workflows. Even though each function may achieve high performance in isolation, the perfor...

Full description

Bibliographic Details
Main Authors: Palkar, Shoumik (Author), Thomas, James J. (Author), Shanbhag, Anil Atmanand (Author), Narayanan, Deepak (Author), Pirk, Holger (Author), Schwarzkopf, Malte (Author), Amarasinghe, Saman P (Author), Zaharia, Matei (Author)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: 2021-12-07T17:57:02Z.
Subjects:
Online Access:Get fulltext
LEADER 02088 am a22002533u 4500
001 137425.2
042 |a dc 
100 1 0 |a Palkar, Shoumik  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
700 1 0 |a Thomas, James J.  |e author 
700 1 0 |a Shanbhag, Anil Atmanand  |e author 
700 1 0 |a Narayanan, Deepak  |e author 
700 1 0 |a Pirk, Holger  |e author 
700 1 0 |a Schwarzkopf, Malte  |e author 
700 1 0 |a Amarasinghe, Saman P  |e author 
700 1 0 |a Zaharia, Matei  |e author 
245 0 0 |a Weld: A Common Runtime for High Performance Data Analytics 
260 |c 2021-12-07T17:57:02Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/137425.2 
520 |a © 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved. Modern analytics applications combine multiple functions from different libraries and frameworks to build increasingly complex workflows. Even though each function may achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. To address this problem, we propose Weld, a runtime for data-intensive applications that optimizes across disjoint libraries and functions. Weld uses a common intermediate representation to capture the structure of diverse data-parallel workloads, including SQL, machine learning and graph analytics. It then performs key data movement optimizations and generates efficient parallel code for the whole workflow. Weld can be integrated incrementally into existing frameworks like TensorFlow, Apache Spark, NumPy and Pandas without changing their user-facing APIs. We show that Weld can speed up these frameworks, as well as applications that combine them, by up to 30×. 
546 |a en 
655 7 |a Article 
773 |t CIDR 2017 - 8th Biennial Conference on Innovative Data Systems Research