Testing this for a partner.
A Pegasus deskside supercomputer with 12x X5690 CPU cores, 48 GB RAM, 500 MB/s IO channel (soon to 1 GB/s), and a GTX 260 graphics card. Connected to an XCT a-Brix 2U unit with 4x NVidia Fermi C2050’s (normally we’d use a JackRabbit unit, but they are all busy with customer projects right now).
First, lets see whats there:
[root@pegasus C]# lspci | grep nVidia | grep VGA 06:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3) 0b:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3) 84:00.0 VGA compatible controller: nVidia Corporation GT200 [GeForce GTX 260] (rev a1) 89:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3) 8e:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
Ahhh …. nice! And yes, you can order units like this now from the day job.
Now lets have a little fun
[root@pegasus C]# bin/linux/release/MonteCarloMultiGPU main(): generating input data... main(): starting 5 host threads... main(): waiting for GPU results... main(): GPU statistics GPU #0 Options : 52 Simulation paths: 262144 GPU #1 Options : 51 Simulation paths: 262144 GPU #2 Options : 51 Simulation paths: 262144 GPU #3 Options : 51 Simulation paths: 262144 GPU #4 Options : 51 Simulation paths: 262144 Total time (ms.): 0.073000 Options per sec.: 3506849.366609 main(): comparing Monte Carlo and Black-Scholes results... L1 norm : 2.995473E-06 Average reserve: 382.126091 PASSED
Yeah … baby! Even ran on the less capable GTX260.
Try some random numbers
[root@pegasus C]# bin/linux/release/MersenneTwister bin/linux/release/MersenneTwister Starting... Initializing data for 24000000 samples... Loading CPU and GPU twisters configurations... Generating random numbers on GPU... MersenneTwister, Throughput = 2.4075 GNumbers/s, Time = 0.00997 s, Size = 24002560 Numbers, NumDevsUsed = 1, Workgroup = 128 Reading back the results... Checking GPU results... ...generating random numbers on CPU using reference generator ...applying Box-Muller transformation on CPU ...comparing the results Max absolute error: 2.324581E-06 L1 norm: 1.713886E-07 PASSED
Not bad, but used only one device.
[root@pegasus C]# bin/linux/release/simpleMultiGPU CUDA-capable device count: 5 Generating input data... Computing with 5 GPU's... GPU Processing time: 29965.884766 (ms) Computing with Host CPU... Comparing GPU and Host CPU results... GPU sum: 16777304.000000 CPU sum: 16777294.395033 Relative difference: 5.724980E-07 PASSED
Been meaning to get a set of more invasive tests going on these units. Will see if I can get my Riemann code ported this weekend. Then try a few other things as well.
Viewed 11741 times by 2460 viewers