GAM-Cluster: Accelerating MetaCluster5.0 with GPU

碩士 === 國立清華大學 === 資訊工程學系 === 104 === MetaCluster5.0 is a program for metagenomics binning, which is used to classify similar reads (DNA fragments) in a metagenomic sample into clusters. As reads come in large scale (up to millions in for a typical sample), and pairwise comparison between reads are...

Full description

Bibliographic Details
Main Authors: Chen, Chih Lin, 陳致霖
Other Authors: Tang,Chuan Yi
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/puuq5m
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 104 === MetaCluster5.0 is a program for metagenomics binning, which is used to classify similar reads (DNA fragments) in a metagenomic sample into clusters. As reads come in large scale (up to millions in for a typical sample), and pairwise comparison between reads are needed to determine their similarity, the running time is slow. In this thesis, our goal is to accelerate MetaCluster5.0. We profiled MetaCluster5.0, and found out the performance bottleneck lies in its component func-tions USMerge and GetNeighbor. On the other hand, these two functions are good candidates to be parallelized with GPU for acceleration; various techniques, such as data partitioning, utilizing shared memory, using table lookup and output buffer, randomization, are proposed and found to be effective. Our experimental results showed a speedup of 3.1 times in USMerge and 8.1 times in GetNeighbor from the original 40-thread parallel version, or a speedup of 64.4 times and 178.3 times, respectively, from the original single-thread version.