Fun monday morning benchmarking

Running NCBI BLAST on the JackRabbit we are preparing for shipment. Used the nt database from last july (21 GB in size, 5+M sequences). Our a. thaliana had 1164 sequences, and about 500k letters.
Took 8m 44s to BLAST these sequences against this database. This means about 2.1838e+13 cell updates per second. This is the product of the number of letters in the database and the sequence under test divided by the total wall clock time. As these are 3.2 GHz CPUs, the cycle time per unit is about 0.3 ns. We get about 3.2E+9 cycles per core per second. With 8 cores, this means something closer to 2.5E+10 cycles per second. So, if these benchmarks are meaningful, that means we are getting pretty close to 850 cell updates per processor cycle.

Something doesn’t seem right there. I think the issue could be the units.
If we pack our nucleotides into bits, we need 2 bits to represent ACGT. We need another bit for represent the N/X. So basically 3 bits per letter. Which means that the database has 2.66 letters per byte, as does the sequence. Which means that each processor cycle is handling about 120 of these packed entities per second.
Still not within the realm of believability. The processor has a fixed cycle type and can do only so many calculations per unit time. Which means that the “cell updates per second” metric may be loosely connected to the actual work performed per cycle.
We need the amount of information handled to be down to 1-16 entities per cycle (the latter reserved for really good SIMD programmers). If we use 8 SIMD registers without doing anything fancy, we might be able to get this to 14.9 packed entities per SIMD register.
This would fit.
I need to look at the code though. I am not sure if the metric is as meaningful as we have been led to believe. Wallclock time still rules, and how well cell-updates-per-second correlates with this is somewhat unknown at this time.