Deskside box with lotsa GPUs

Testing this for a partner.

A Pegasus deskside supercomputer with 12x X5690 CPU cores, 48 GB RAM, 500 MB/s IO channel (soon to 1 GB/s), and a GTX 260 graphics card. Connected to an XCT a-Brix 2U unit with 4x NVidia Fermi C2050’s (normally we’d use a JackRabbit unit, but they are all busy with customer projects right now).

First, lets see whats there:

[root@pegasus C]# lspci | grep nVidia | grep VGA
06:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
0b:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
84:00.0 VGA compatible controller: nVidia Corporation GT200 [GeForce GTX 260] (rev a1)
89:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
8e:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)

Ahhh …. nice! And yes, you can order units like this now from the day job.

Now lets have a little fun

[root@pegasus C]# bin/linux/release/MonteCarloMultiGPU  
main(): generating input data...
main(): starting 5 host threads...
main(): waiting for GPU results...
main(): GPU statistics
GPU #0
Options         : 52
Simulation paths: 262144
GPU #1
Options         : 51
Simulation paths: 262144
GPU #2
Options         : 51
Simulation paths: 262144
GPU #3
Options         : 51
Simulation paths: 262144
GPU #4
Options         : 51
Simulation paths: 262144

Total time (ms.): 0.073000
Options per sec.: 3506849.366609
main(): comparing Monte Carlo and Black-Scholes results...
L1 norm        : 2.995473E-06
Average reserve: 382.126091
PASSED 

Yeah … baby! Even ran on the less capable GTX260.

Try some random numbers

[root@pegasus C]# bin/linux/release/MersenneTwister 
bin/linux/release/MersenneTwister Starting...

Initializing data for 24000000 samples...
Loading CPU and GPU twisters configurations...
Generating random numbers on GPU...

MersenneTwister, Throughput = 2.4075 GNumbers/s, Time = 0.00997 s, Size = 24002560 Numbers, NumDevsUsed = 1, Workgroup = 128

Reading back the results...
Checking GPU results...
 ...generating random numbers on CPU using reference generator
 ...applying Box-Muller transformation on CPU
 ...comparing the results

Max absolute error: 2.324581E-06
L1 norm: 1.713886E-07

PASSED

Not bad, but used only one device.

[root@pegasus C]# bin/linux/release/simpleMultiGPU  
CUDA-capable device count: 5
Generating input data...

Computing with 5 GPU's...
  GPU Processing time: 29965.884766 (ms)

Computing with Host CPU...

Comparing GPU and Host CPU results...
  GPU sum: 16777304.000000
  CPU sum: 16777294.395033
  Relative difference: 5.724980E-07 

PASSED

Been meaning to get a set of more invasive tests going on these units. Will see if I can get my Riemann code ported this weekend. Then try a few other things as well.

Viewed 7652 times by 1799 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail