The blame game

This isn’t what you might think from the title. Its an observation. I hope I don’t misstate what I intend to say, so feel free to chime in if you don’t agree with the wording.
When you have a situation where a customer has a set of vendors, and a problem that needs resolution, the customer will gravitate towards assigning blame for the problem to the most competent of the vendors, the most proactive of the vendors, in the hope that it will be resolved, regardless of whether or not that vendor’s gear/stack is in any way involved.
Ok, one might say “Hah! Wishful thinking.” or “you must say this to keep yourself from going crazy over all the blame for problems you get.” or similar.
Not quite.
We get problem reports, and we take ownership of them. Doesn’t matter the cause, we will get to the bottom of them, figure out a workaround, or a solution, and recommond or implement it. We don’t care if it goes outside of our box/stack, an issue outside can still affect us.
Sometimes the problems are self inflicted (and we’ve seen some doozies). Sometimes they are transient. As often as not they are in hardware we cannot control or even look at.
But we do get requests, fairly often to look at problems. Even for customers whom have none of our gear. Which is why I believe its harder to assign blame for failures to us in these cases … would be a stretch.
In the last several weeks, we’ve seen a number of “we didn’t get the hardware from you, but can you help us solve this problem” type engagements. In some cases these are customer home-brew systems (rarely a good idea, but ok, they happen with significant frequency), in others, they are vendor supplied by an IT vendor with no real understanding of what an HPC system is, how to design/test them, how to debug them.
Some of the self inflicted are due to software stack policies that prevent installation of updates or patches that solve problems.
All in all it gets easier for the user to pin the blame on one thing, supplied by one vendor, if that vendor will relentlessly pursue a solution.
This said, some of the things do deserve their bad reputations … early Lustre/Gluster/… were something of a challenge to stabilize.
So I find it interesting, as we approach 10 years in business, and previous 6 years at SGI, and 1.5 at MSC, that the pattern hasn’t changed that much. Find the most competent of your suppliers and, not merely pick their brains, but actively get them to take ownership of the problems.
There is an easier way.
Just ask.
We do this for our paying customers.

3 thoughts on “The blame game”

  1. I think it probably depends on the temperament of the client. I could just as easily see them trying to sweet talk the most competent vendor into helping them with their problem. Really though, who gets blamed, and who deserves blame are really complicated questions.
    I think it’s entirely possible for a vendor to have made mistakes and have gone above and beyond the call of duty simultaneously. Say the user had requirements that they didn’t initially ask for, the vendor agrees to completely revamp the solution at the user’s request and at significant personal cost. If mistakes are made implementing the revamped solution, is it reasonable to blame the vendor when they should never had to do that in the first place?
    Once a new party (could be a vendor, or even internal support) takes over, at what point do they deserve blame for on-going problems? How much blame does the customer deserve for their part in it?
    I guess my take is that it’s important to document who did what, how they did it, and what the effects were. One should be very careful though, when trying to extrapolate that to assign blame to someone when something doesn’t go right. I think the truth is often far murkier.

Comments are closed.