Interesting results from Microsoft's SQLio benchmark on JR4

I’ll have the full set of numbers soon from the tests our customer was running on their shiny new JR4 (they agreed to let us talk about them). One of the more interesting take-aways is that the 24 drive unit appears to provide something a bit north of 5000 IOPs in a number of the random tests, doing seeks on files larger than ram. I need to think this through somewhat.
Anyone can create files in an effectively large ram cache and seek with minimal latency, getting great IOPs numbers. Once you get out of cache, your performance generally craters.
I need to run a more extended set of tests, so that I understand this better. If you look at many benchmark reports, they happily report tests using IOzone and other codes that are completely containable within the system memory … so there is very little actual IO, until after the test is over. This has been one of my central criticisms of such testing. Moreover these tests rarely, if ever, map into real world scenarios and use cases. I hear arguments indicating it is better to use something than nothing, but I don’t think I agree with this view. I’d rather use something that had a sound basis behind it, than a questionable metric of dubious value.