High-Performance Computing For Support Vector Machines

Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing ar...

Full description

Bibliographic Details
Main Author: Tavara, Shirin
Format: Others
Language:English
Published: Högskolan i Skövde, Institutionen för informationsteknologi 2018
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556
http://nbn-resolving.de/urn:isbn:978-91-984187-8-1
id ndltd-UPSALLA1-oai-DiVA.org-his-16556
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-his-165562019-02-15T05:59:04ZHigh-Performance Computing For Support Vector MachinesengTavara, ShirinHögskolan i Skövde, Institutionen för informationsteknologiHögskolan i Skövde, Forskningscentrum för InformationsteknologiSkövde : University of Skövde2018Computer SciencesDatavetenskap (datalogi)Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems. Licentiate thesis, comprehensive summaryinfo:eu-repo/semantics/masterThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556urn:isbn:978-91-984187-8-1Dissertation Series ; 26 (2018)application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Computer Sciences
Datavetenskap (datalogi)
spellingShingle Computer Sciences
Datavetenskap (datalogi)
Tavara, Shirin
High-Performance Computing For Support Vector Machines
description Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems.
author Tavara, Shirin
author_facet Tavara, Shirin
author_sort Tavara, Shirin
title High-Performance Computing For Support Vector Machines
title_short High-Performance Computing For Support Vector Machines
title_full High-Performance Computing For Support Vector Machines
title_fullStr High-Performance Computing For Support Vector Machines
title_full_unstemmed High-Performance Computing For Support Vector Machines
title_sort high-performance computing for support vector machines
publisher Högskolan i Skövde, Institutionen för informationsteknologi
publishDate 2018
url http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556
http://nbn-resolving.de/urn:isbn:978-91-984187-8-1
work_keys_str_mv AT tavarashirin highperformancecomputingforsupportvectormachines
_version_ 1718976270052622336