Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster

碩士 === 國立臺灣大學 === 應用數學科學研究所 === 106 === We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategi...

Full description

Bibliographic Details
Main Authors: Yu-Hsiang Tsai, 蔡宇翔
Other Authors: Weichung Wang
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/b3s659
id ndltd-TW-106NTU05507003
record_format oai_dc
spelling ndltd-TW-106NTU055070032019-05-30T03:50:44Z http://ndltd.ncl.edu.tw/handle/b3s659 Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster 在叢集伺服器上高效能多維奇異值分解及基於閉曲線積分的特徵值分解 Yu-Hsiang Tsai 蔡宇翔 碩士 國立臺灣大學 應用數學科學研究所 106 We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategies and improved the ability to solve the large tensor problem. With the explosion of big data, finding ways of compressing and analyzing large data sets with the multi-way relationship - i.e., tensors - quickly and efficiently have become critical in High-Performance Computing. We implement two existed methods which are Higher-Order Singular Value Decomposition and Sequential Truncated Higher-Order Singular Value Decomposition to achieve Tucker Decomposition. Implementing them with GPU is very difficult because we usually can not store the whole tensor into GPU memory. We use QR method and Gram method to reduce the problem size to make its size allowed by GPU memory. We also implemented QR method and Gram by part-by-part. It can help us to solve the large data problem and use computing to cover data transferring. Finally, We achieve 163.21x speedup over a CUDA library-based solution. In the future, we want to apply it to the real application. In "Contour Integral based Eigen Decomposition", we proposed a divide-and-conquer flow to solving the certain eigenpairs in the specific region containing many eigenpairs with eigensolver based on contour integral with the locking technique, and use it to solve the generalized eigenvalue problem from the organic material simulation. Solving eigenvalue problems is an essential part of many applications. Those matrices are often large and sparse, but the eigenpairs only are required in the region of interest. Several solvers can solve the eigenpairs in the selected region such as FEAST and CIRR. When there are many eigenpairs in the selected region, the performance is slow, so the partition of the region is needed. Deciding the partition is very difficult but critical such that solving each sub-region should be efficient. When some eigenvector is converged early, the solver still spends time on them. We introduce the two partition method, uniform dividing by the estimated eigenvalue number and dividing by domain acknowledgment. We increase the eigensolver ability to solve the region containing many eigenpairs and get better performance with the proper partition. We also use the locking technique to avoid spending the time on converged eigenpairs. In the future, we would like to design an automatic flow to generate the partition whose sub-region spends almost the same executing time. Weichung Wang 王偉仲 2018 學位論文 ; thesis 65 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 應用數學科學研究所 === 106 === We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategies and improved the ability to solve the large tensor problem. With the explosion of big data, finding ways of compressing and analyzing large data sets with the multi-way relationship - i.e., tensors - quickly and efficiently have become critical in High-Performance Computing. We implement two existed methods which are Higher-Order Singular Value Decomposition and Sequential Truncated Higher-Order Singular Value Decomposition to achieve Tucker Decomposition. Implementing them with GPU is very difficult because we usually can not store the whole tensor into GPU memory. We use QR method and Gram method to reduce the problem size to make its size allowed by GPU memory. We also implemented QR method and Gram by part-by-part. It can help us to solve the large data problem and use computing to cover data transferring. Finally, We achieve 163.21x speedup over a CUDA library-based solution. In the future, we want to apply it to the real application. In "Contour Integral based Eigen Decomposition", we proposed a divide-and-conquer flow to solving the certain eigenpairs in the specific region containing many eigenpairs with eigensolver based on contour integral with the locking technique, and use it to solve the generalized eigenvalue problem from the organic material simulation. Solving eigenvalue problems is an essential part of many applications. Those matrices are often large and sparse, but the eigenpairs only are required in the region of interest. Several solvers can solve the eigenpairs in the selected region such as FEAST and CIRR. When there are many eigenpairs in the selected region, the performance is slow, so the partition of the region is needed. Deciding the partition is very difficult but critical such that solving each sub-region should be efficient. When some eigenvector is converged early, the solver still spends time on them. We introduce the two partition method, uniform dividing by the estimated eigenvalue number and dividing by domain acknowledgment. We increase the eigensolver ability to solve the region containing many eigenpairs and get better performance with the proper partition. We also use the locking technique to avoid spending the time on converged eigenpairs. In the future, we would like to design an automatic flow to generate the partition whose sub-region spends almost the same executing time.
author2 Weichung Wang
author_facet Weichung Wang
Yu-Hsiang Tsai
蔡宇翔
author Yu-Hsiang Tsai
蔡宇翔
spellingShingle Yu-Hsiang Tsai
蔡宇翔
Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
author_sort Yu-Hsiang Tsai
title Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_short Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_full Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_fullStr Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_full_unstemmed Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster
title_sort efficient higher-order singular value decomposition and contour integral based eigen decomposition on cpu/gpu cluster
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/b3s659
work_keys_str_mv AT yuhsiangtsai efficienthigherordersingularvaluedecompositionandcontourintegralbasedeigendecompositiononcpugpucluster
AT càiyǔxiáng efficienthigherordersingularvaluedecompositionandcontourintegralbasedeigendecompositiononcpugpucluster
AT yuhsiangtsai zàicóngjícìfúqìshànggāoxiàonéngduōwéiqíyìzhífēnjiějíjīyúbìqūxiànjīfēndetèzhēngzhífēnjiě
AT càiyǔxiáng zàicóngjícìfúqìshànggāoxiàonéngduōwéiqíyìzhífēnjiějíjīyúbìqūxiànjīfēndetèzhēngzhífēnjiě
_version_ 1719195399398359040