All of us are guilty at some point in time or the other, of embellishing some attribute about something we talk about. We like our choice to be the “winner”, whatever that means. This “crime” takes many forms.
What we see quite often is omission, either purposeful or inadvertant which paints a different picture than “reality” would indicate. We also see specious comparisons, and poor analytics to back up the conclusions. Bad measurement tools are also common. As are a complete lack of understanding of error and error analysis.
How many benchmarketers have measured the run times of their codes more than once? How many have formed an average, computed a standard deviation (this assumes a Gaussian distribution of timing results around the average, which may not be a bad first order guess), and thus computed a real measurement error? And how many of these results which are happily reported as being significant are actually noise, or worse, how many of these data point differences are actually a case of points with overlapping error bars?
The art of benchmarketing is in part to prevent the scientific benchmarking of a system or code from being presented without the appropriate spin. Real measurement is a science, and it requires rigorous tools and analytics to be meaningful as a measurement. In order to measure, you have to have a measuring stick, your benchmark code, and something to measure. If you really want to see if your machine is fast, you check it against the other top of the line machines.
If you want to paint a lobsided comparison, in order to bolster a weak argument, you select your fastest machine, and competitive machines from a generation or two back.
Funny thing about those lobsided comparisons … benchmarketeers can go to town with them. Doesn’t make them meaningful or useful.
Remember that if you see comparisons between the latest and greatest from company X, and something 2 or more generations back from company Y. If you get someone telling you that their (company X’s product) is much faster based upon something like the above, then someone is trying to blow air up your skirt, pants, or other layers of clothing.
This happened to us this past week. Looking over the results of some (barely meaningful) benchmarks in trade rags for some company’s new product, they were comparing to another competitors product from 2 generations ago.
Uh…. sure. Being 20% faster than a product from company Y which is 30% slower than the current generation of company Y’s product is nothing to be proud of. More like ashamed of.