Tools and frameworks for data abstraction in a performance context

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-s...

Full description

Bibliographic Details
Main Author: Chen, Alexander Y
Other Authors: Alan Edelman.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2017
Subjects:
Online Access:http://hdl.handle.net/1721.1/112835
id ndltd-MIT-oai-dspace.mit.edu-1721.1-112835
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-1128352019-05-02T16:09:17Z Tools and frameworks for data abstraction in a performance context Chen, Alexander Y Alan Edelman. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 109-111). As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. by Alexander Y Chen. M. Eng. 2017-12-20T17:24:39Z 2017-12-20T17:24:39Z 2017 2017 Thesis http://hdl.handle.net/1721.1/112835 1015200950 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 111 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Chen, Alexander Y
Tools and frameworks for data abstraction in a performance context
description Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 109-111). === As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. === by Alexander Y Chen. === M. Eng.
author2 Alan Edelman.
author_facet Alan Edelman.
Chen, Alexander Y
author Chen, Alexander Y
author_sort Chen, Alexander Y
title Tools and frameworks for data abstraction in a performance context
title_short Tools and frameworks for data abstraction in a performance context
title_full Tools and frameworks for data abstraction in a performance context
title_fullStr Tools and frameworks for data abstraction in a performance context
title_full_unstemmed Tools and frameworks for data abstraction in a performance context
title_sort tools and frameworks for data abstraction in a performance context
publisher Massachusetts Institute of Technology
publishDate 2017
url http://hdl.handle.net/1721.1/112835
work_keys_str_mv AT chenalexandery toolsandframeworksfordataabstractioninaperformancecontext
_version_ 1719035188061667328