Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster

碩士 === 國立臺灣大學 === 應用數學科學研究所 === 106 === We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategi...

Full description

Bibliographic Details
Main Authors:	Yu-Hsiang Tsai, 蔡宇翔
Other Authors:	Weichung Wang
Format:	Others
Language:	en_US
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/b3s659

id	ndltd-TW-106NTU05507003
record_format	oai_dc
spelling	ndltd-TW-106NTU055070032019-05-30T03:50:44Z http://ndltd.ncl.edu.tw/handle/b3s659 Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster 在叢集伺服器上高效能多維奇異值分解及基於閉曲線積分的特徵值分解 Yu-Hsiang Tsai 蔡宇翔碩士國立臺灣大學應用數學科學研究所 106 We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategies and improved the ability to solve the large tensor problem. With the explosion of big data, finding ways of compressing and analyzing large data sets with the multi-way relationship - i.e., tensors - quickly and efficiently have become critical in High-Performance Computing. We implement two existed methods which are Higher-Order Singular Value Decomposition and Sequential Truncated Higher-Order Singular Value Decomposition to achieve Tucker Decomposition. Implementing them with GPU is very difficult because we usually can not store the whole tensor into GPU memory. We use QR method and Gram method to reduce the problem size to make its size allowed by GPU memory. We also implemented QR method and Gram by part-by-part. It can help us to solve the large data problem and use computing to cover data transferring. Finally, We achieve 163.21x speedup over a CUDA library-based solution. In the future, we want to apply it to the real application. In "Contour Integral based Eigen Decomposition", we proposed a divide-and-conquer flow to solving the certain eigenpairs in the specific region containing many eigenpairs with eigensolver based on contour integral with the locking technique, and use it to solve the generalized eigenvalue problem from the organic material simulation. Solving eigenvalue problems is an essential part of many applications. Those matrices are often large and sparse, but the eigenpairs only are required in the region of interest. Several solvers can solve the eigenpairs in the selected region such as FEAST and CIRR. When there are many eigenpairs in the selected region, the performance is slow, so the partition of the region is needed. Deciding the partition is very difficult but critical such that solving each sub-region should be efficient. When some eigenvector is converged early, the solver still spends time on them. We introduce the two partition method, uniform dividing by the estimated eigenvalue number and dividing by domain acknowledgment. We increase the eigensolver ability to solve the region containing many eigenpairs and get better performance with the proper partition. We also use the locking technique to avoid spending the time on converged eigenpairs. In the future, we would like to design an automatic flow to generate the partition whose sub-region spends almost the same executing time. Weichung Wang 王偉仲 2018 學位論文 ; thesis 65 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 應用數學科學研究所 === 106 === We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategies and improved the ability to solve the large tensor problem. With the explosion of big data, finding ways of compressing and analyzing large data sets with the multi-way relationship - i.e., tensors - quickly and efficiently have become critical in High-Performance Computing. We implement two existed methods which are Higher-Order Singular Value Decomposition and Sequential Truncated Higher-Order Singular Value Decomposition to achieve Tucker Decomposition. Implementing them with GPU is very difficult because we usually can not store the whole tensor into GPU memory. We use QR method and Gram method to reduce the problem size to make its size allowed by GPU memory. We also implemented QR method and Gram by part-by-part. It can help us to solve the large data problem and use computing to cover data transferring. Finally, We achieve 163.21x speedup over a CUDA library-based solution. In the future, we want to apply it to the real application. In "Contour Integral based Eigen Decomposition", we proposed a divide-and-conquer flow to solving the certain eigenpairs in the specific region containing many eigenpairs with eigensolver based on contour integral with the locking technique, and use it to solve the generalized eigenvalue problem from the organic material simulation. Solving eigenvalue problems is an essential part of many applications. Those matrices are often large and sparse, but the eigenpairs only are required in the region of interest. Several solvers can solve the eigenpairs in the selected region such as FEAST and CIRR. When there are many eigenpairs in the selected region, the performance is slow, so the partition of the region is needed. Deciding the partition is very difficult but critical such that solving each sub-region should be efficient. When some eigenvector is converged early, the solver still spends time on them. We introduce the two partition method, uniform dividing by the estimated eigenvalue number and dividing by domain acknowledgment. We increase the eigensolver ability to solve the region containing many eigenpairs and get better performance with the proper partition. We also use the locking technique to avoid spending the time on converged eigenpairs. In the future, we would like to design an automatic flow to generate the partition whose sub-region spends almost the same executing time.
author2	Weichung Wang
author_facet	Weichung Wang Yu-Hsiang Tsai 蔡宇翔
author	Yu-Hsiang Tsai 蔡宇翔
spellingShingle	Yu-Hsiang Tsai 蔡宇翔 Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
author_sort	Yu-Hsiang Tsai
title	Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_short	Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_full	Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_fullStr	Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_full_unstemmed	Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_sort	efficient higher-order singular value decomposition and contour integral based eigen decomposition on cpu/gpu cluster
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/b3s659
work_keys_str_mv	AT yuhsiangtsai efficienthigherordersingularvaluedecompositionandcontourintegralbasedeigendecompositiononcpugpucluster AT càiyǔxiáng efficienthigherordersingularvaluedecompositionandcontourintegralbasedeigendecompositiononcpugpucluster AT yuhsiangtsai zàicóngjícìfúqìshànggāoxiàonéngduōwéiqíyìzhífēnjiějíjīyúbìqūxiànjīfēndetèzhēngzhífēnjiě AT càiyǔxiáng zàicóngjícìfúqìshànggāoxiàonéngduōwéiqíyìzhífēnjiějíjīyúbìqūxiànjīfēndetèzhēngzhífēnjiě
_version_	1719195399398359040

Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster

Similar Items