Failing 10GbE NICs

I won’t mention vendor by name here. Needless to say, I am unhappy with the failure rate on their NICs. We had a number of units we bought for internal use as well as for customer use. The NICs would throw various driver exceptions, and kernel panic the machines. It was doing this to our central server this past week, while I had been lighting up kvm’s on an app server, specifically kernel panicking under even moderate load.
I got fed up with these crashes. I yanked the card and put a nice shiny new Intel X520-DA2 unit in there instead (so now you know its not Intel).
Finished configuring it, and then checking the rest of the gear. Yeah, have a few more of these particular brand cards, I think we are going to be retiring over the next 2-3 months.
We’ve been told by the card maker that crashes are virtually unknown … that they’ve not heard of such things. We on the other hand, have first hand experience with them, in our lab, in our customer’s sites (part of the reason ICL, in the previous post, was created, was specifically to catch these offending subsystem in the act of tossing its cookies, and bringing down the machine with it).
This got too much for us. If the cards and drivers are well behaved, we’ll leave them alone. Likely we are going to do what we did for the Corsair SSDs a few years ago for anyone experiencing issues. We have a few customers happily running on those ancient Corsair SSD with nary a problem. Crashing a system, the way these NICs did, is, not such a good idea.