Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing c...

Full description

Bibliographic Details
Main Authors: Michael Kistler, John Gunnels, Daniel Brokenshire, Brad Benton
Format: Article
Language:English
Published: Hindawi Limited 2009-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.3233/SPR-2009-0278
id doaj-b3f634e69fd943b18f4d5b6742ec9bb2
record_format Article
spelling doaj-b3f634e69fd943b18f4d5b6742ec9bb22021-07-02T06:17:29ZengHindawi LimitedScientific Programming1058-92441875-919X2009-01-01171-2435710.3233/SPR-2009-0278Programming the Linpack Benchmark for the IBM PowerXCell 8i ProcessorMichael Kistler0John Gunnels1Daniel Brokenshire2Brad Benton3IBM Corporation, USAIBM Corporation, USAIBM Corporation, USAIBM Corporation, USAIn this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.http://dx.doi.org/10.3233/SPR-2009-0278
collection DOAJ
language English
format Article
sources DOAJ
author Michael Kistler
John Gunnels
Daniel Brokenshire
Brad Benton
spellingShingle Michael Kistler
John Gunnels
Daniel Brokenshire
Brad Benton
Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor
Scientific Programming
author_facet Michael Kistler
John Gunnels
Daniel Brokenshire
Brad Benton
author_sort Michael Kistler
title Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor
title_short Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor
title_full Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor
title_fullStr Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor
title_full_unstemmed Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor
title_sort programming the linpack benchmark for the ibm powerxcell 8i processor
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2009-01-01
description In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.
url http://dx.doi.org/10.3233/SPR-2009-0278
work_keys_str_mv AT michaelkistler programmingthelinpackbenchmarkfortheibmpowerxcell8iprocessor
AT johngunnels programmingthelinpackbenchmarkfortheibmpowerxcell8iprocessor
AT danielbrokenshire programmingthelinpackbenchmarkfortheibmpowerxcell8iprocessor
AT bradbenton programmingthelinpackbenchmarkfortheibmpowerxcell8iprocessor
_version_ 1721337502673928192