Tools and frameworks for data abstraction in a performance context
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-s...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/112835 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-112835 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-1128352019-05-02T16:09:17Z Tools and frameworks for data abstraction in a performance context Chen, Alexander Y Alan Edelman. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 109-111). As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. by Alexander Y Chen. M. Eng. 2017-12-20T17:24:39Z 2017-12-20T17:24:39Z 2017 2017 Thesis http://hdl.handle.net/1721.1/112835 1015200950 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 111 pages application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Chen, Alexander Y Tools and frameworks for data abstraction in a performance context |
description |
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 109-111). === As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. === by Alexander Y Chen. === M. Eng. |
author2 |
Alan Edelman. |
author_facet |
Alan Edelman. Chen, Alexander Y |
author |
Chen, Alexander Y |
author_sort |
Chen, Alexander Y |
title |
Tools and frameworks for data abstraction in a performance context |
title_short |
Tools and frameworks for data abstraction in a performance context |
title_full |
Tools and frameworks for data abstraction in a performance context |
title_fullStr |
Tools and frameworks for data abstraction in a performance context |
title_full_unstemmed |
Tools and frameworks for data abstraction in a performance context |
title_sort |
tools and frameworks for data abstraction in a performance context |
publisher |
Massachusetts Institute of Technology |
publishDate |
2017 |
url |
http://hdl.handle.net/1721.1/112835 |
work_keys_str_mv |
AT chenalexandery toolsandframeworksfordataabstractioninaperformancecontext |
_version_ |
1719035188061667328 |