I spoke with the AMD folks during SC and afterwords. Someone leaked the info yesterday, and today on the x86_64 discussion group, the errata and patches were detailed. I have had the patches for a few days now, and have a bios update I need to apply to a motherboard.
That said, what this is, is a particular TLB-cache interaction, that under a very specific set of circumstances, will trigger a machine check exception, and hang a machine.
While quite a few of you will knowingly wink at each other that this is the reason that 2350s are so hard to get, I think there may be other reasons. Yes there is a stop ship now, that is what is being reported. But the first batches (quite a few) went to folks whom have been waiting for them for a while. Large processor count machines.
This said, I am not happy with the scenario … I would much prefer AMD push its patch, at least temporarily, to the kernel folks. My understanding is that the patch solves the problem with a very minimal impact (the bios patch will be more intrusive on performance). I would also have preferred a better disclosure of the problem.
I do appreciate what AMD is going through. Intel has had similar issues with Core2, but they did not stop ship their processors. This is the wrong approach.
I think it is worth re-iterating that AMD needs (sorely) to score some important firsts going forward. They have a rather unique ability to get a usable 8 core unit out the door, simply by putting 2 of its quads on an MCM and hooking them together via HT. IBM knows how to do MCM, this is not difficult to do. They simply have to decide to do it. They should understand that if they don’t Intel will. And will beat them by another year as AMD strives for engineering purity, versus shipping product. The former doesn’t pay the bills (or stockholder dividends), the latter does.
(Disclosure: AMD has paid us in the past to write whitepapers, perform benchmarks, give presentations, and related).