Tools and frameworks for data abstraction in a performance context

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-s...

Full description

Bibliographic Details
Main Author:	Chen, Alexander Y
Other Authors:	Alan Edelman.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2017
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/112835

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-112835
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-1128352019-05-02T16:09:17Z Tools and frameworks for data abstraction in a performance context Chen, Alexander Y Alan Edelman. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 109-111). As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. by Alexander Y Chen. M. Eng. 2017-12-20T17:24:39Z 2017-12-20T17:24:39Z 2017 2017 Thesis http://hdl.handle.net/1721.1/112835 1015200950 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 111 pages application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Chen, Alexander Y Tools and frameworks for data abstraction in a performance context
description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 109-111). === As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. === by Alexander Y Chen. === M. Eng.
author2	Alan Edelman.
author_facet	Alan Edelman. Chen, Alexander Y
author	Chen, Alexander Y
author_sort	Chen, Alexander Y
title	Tools and frameworks for data abstraction in a performance context
title_short	Tools and frameworks for data abstraction in a performance context
title_full	Tools and frameworks for data abstraction in a performance context
title_fullStr	Tools and frameworks for data abstraction in a performance context
title_full_unstemmed	Tools and frameworks for data abstraction in a performance context
title_sort	tools and frameworks for data abstraction in a performance context
publisher	Massachusetts Institute of Technology
publishDate	2017
url	http://hdl.handle.net/1721.1/112835
work_keys_str_mv	AT chenalexandery toolsandframeworksfordataabstractioninaperformancecontext
_version_	1719035188061667328

Tools and frameworks for data abstraction in a performance context

Similar Items