Summary: | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 109-111). === As data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine. === by Alexander Y Chen. === M. Eng.
|