We often get interesting requirements for clusters. Sometimes we speak to people who believe that clock frequency defines the speed of the unit, so therefore, a 3.6 GHz processor must be faster than a 2.66 GHz processor. This is not the case (clock frequency == performance), but it has been hammered home by one OEM (cough cough) for a long time, so their customers are attuned to it. Makes it hard to explain to a customer how a 2.66 GHz Woodcrest could best a 3.6 GHz Xeon. This is the price you pay for inaccurate information pushed into a marketing channel. You have to deal with it yourself when you change your own tune later on.
Another issue is dealing with the fallout of the old Intel compilers. You know the ones that happily placed a test for “Genuine Intel” processors in the front of the code, and selected code paths based upon the result? So they didn’t bother to test for functionality, just strings. Which meant if you run such compiled code on an Opteron part you automatically select the worst code path. Some of us call this a bug. Intentional? I dunno. Bad design/implementation? Uh huh. Fallout later on? Yuppers. FWIW, we have seen lots of customers moving to non-Intel compilers due to shenanigans like this. A shame. The latest compilers are pretty good. On all the platforms.
Another poor decision that annoys me is the lack of a int32 vector max or min instruction in SSE. Would love to see it. Up to and including SSE3, I don’t beleive it has been done. SSE4? If there is a pointer to the specs, I would like to read it.
Call it a missing feature rather than a poor implementation. Hopefully next time the SSE powers that be will get this right and implement stuff for non-FP people. We have ints and we want to use them nicely. We want to make it easy to use them. Not costly and painful.