I have been a long proponent of meaningful benchmarks. Meaningful benchmarks are those that can be used with a reasonable level of predictive power to help in sizing and other issues.
I am also a proponent of market/institutional knowledge … if you have been working in HPC for a while, you might have a clue as to how some systems run, some good design points, some really bad ideas (“hey lets run a cluster over pairs of SLIP lines”).
Well, had an amusing exchange on Beowulf list today. The person seemed put off by my pointing out that in a cluster, IO is an issue, and you need to think carefully about the IO prior to setting a particular expectation. This person doesn’t have a great deal of experience in HPC, well, none from what I can see, and was put off by my pointing out that IO performance is an issue, and you need to think carefully about how to address it.
One of the things that our customers find valuable is our experience in catching these issues, helping to align/set expectations. The last thing you want to do is to create a scalable resource, and place a decidedly non-scalable element in there at a critical point.

Sadly, this person was hung up on the benchmarking point. He couldn’t get past running meaningful benchmarks, went on the attack in public and private.
He is now in my filters. I have too much to do to worry about engaging with arguments with people who are yelling my points at me. Silly really.
I have said, many many times, countless times … in public, in private that

It is worth noting that the only benchmarks that are really meaningful to most users would use their own codes and their own input data sets

I have said this in published papers, in white papers, in marketing material, online, etc. Getting meaningful benchmarks together is hard work for most people, and few if any actually do this. In these cases, the deep knowledge of how certain apps behave and patterns that you have to worry about as you scale up, are quite beneficial.
It is naive (dangerously so in many cases) to think that all users will benchmark their own codes, or systems, in such a way to measure what they need to provide good sizing guidance. Moreover, as you scale up your problems to where you want to run them, new elements of serialization emerge. You find the algorithm that you have been using to read your data files fails to scale as you move to very large sized files (Fluent). You find that algorithms that worked well with 4 nodes fail miserably at 40.
That is you need to have a deep understanding of your code to predict what will happen as you scale up in node count, in number of processors, in file I/O.
If you don’t have this, you can always speak to people with a good background in HPC, who might be able to help. This is part of what we do.
And occasionally, you will get criticized by self-righteous and presumptuous people for doing so.
I guess it comes with the territory.
We live with the fact that most users still use IOzone and bonnie for their IO benchmarking, though neither one is a reasonable approximation to end user IO patterns. We have written a few benchmarks and benchmark drivers in our own day, one of which has been in use by a number of computer vendors, one specifically for marketing purposes. (see page 25).
Basically most users won’t use their own code as benchmarks, but will substitute “standard” benchmarks and claim similarity. Such is life. SpecFP ratios don’t tell a meaningful story. What does is real application benchmarks. SpecFPs give you approximations to the results.
Yet people rely on them.
This is because, for quite a few people, the cost-benefit analysis of doing their own benchmarks is weighted decidedly in favor of minimal time/effort on their part. Leverage what exists. Use that institutional knowledge.
In some cases this works well. In some it does not (benchmarks can be, and are, gamed … results may not represent real test cases). Again, this is where the experience comes in.
Someone who has a clue about what and how to build given an app and a little more information, could be quite a valuable resource.
Update: This definition rather precisely fits the person I was attempting to discuss things with.

An Internet troll, or simply troll in Internet slang, is someone who posts controversial and usually irrelevant or off-topic messages in an online community, such as an online discussion forum or chat room, with the intention of baiting other users into an emotional response[1] or to generally disrupt normal on-topic discussion.[2]

I made the mistake of responding to the troll. He has been filtered. Seems to have sent an email to me after he was filtered.
Folks, the best way to deal with trolls it to ignore them. They thrive on the interactions, on the reactions. Like the Monty Python skit, they simply want an argument. 5 minute, or half hour, doesn’t matter. Just argue with them.
That said, these are precious minutes of your life you can never get back. Hence the need for filters. They are not worth your time. Or mine.