bad design + bad implementation = company success ??? Seriously ???

We are often hired to work on existing systems, to see if we can help make them faster and better. I am working on such a project now, but this post is not about this project.
I’ve noticed a tendency in the market to shoehorn a set of designs for storage/computing systems into areas they weren’t designed for. Moreover, these designs would be right at home 15 years ago, since then, far better scale out designs have come along which do a far better job than the older designs.
Take the concept of hooking many MANY drives into a RAID controller. Lets say we fully populate a SAS link with 120-ish drives. And then we break these into 12x 10 drive RAID6’s. All hanging off this one controller.
Then we build a machine with multiple of these controllers. And hang more drives off of them.
Think “filer head”.
Think “massively overburdened RAID”.
Think “slow as molassas”
I see this design in HPC and other places. We encounter it in competitive situations. We see people then noting that they can add a nice PCIe Flash card and get a whopping 1+ GB/s out of it.
I gotta scratch my head over this. Seriously?
We have tightly coupled storage and computing systems on the market now that individually sustain over 5GB/s per unit for TB sized IO. We have extreme performance Flash and SSD units that individually sustain 10+ GB/s.
We can aggregate these together using scale out software glue. We can tightly couple them using a variety of software stacks to present massive performance at scale.
And this doesn’t take into consideration what is coming.
The bad implementation part is a combination of how people set things up, and how organized their racks are. Systems need to be serviceable. Disorganization leads to inefficient problem resolution.
I spent the last 2 days in an Equinix data center in NJ. As I’m walking past these racks, with spaghetti wiring, I keep thinking about that support issue. How is it possible to manage … or do you make changes by throwing away old kit in favor of new?
I also think about what I run into with gear we are asked to fix or make work better. Sometimes you work with a good admin/manager who knows the value of carefully thought out automation. Sometimes you encounter a haphazard … hapless … default install (in a slightly broken manner) of something not well designed or tuned to the problem at hand.
This post is in part about running into those layers of bad choices … bad overall design, crappy installation, poor integration. They make life harder. I’ve spent hours cleaning up after others. Days in some cases. Cleanup is harder when the infrastructure design is so poorly thought out or implemented that I have to hack around what is in place to do the job right. And that means that the job takes longer. And is more complex.
More things left done wrong or poorly done, means more clean up time later on.
The sad thing is that customers paid for some of this crap. And keep paying for it.
The point of this is, there are some companies that are wildly successful in terms of moving gear. Their designs are … er … left wanting … but boy can they ship by the mega-ton.
I guess I shouldn’t complain, as we derive revenue from their failure. But It annoys me all the same. They could get better gear, with a better design, and a better implementation. Or … they can keep doing what they are doing, and continue to add to the coffers of those who maybe aren’t doing such a good job.
I like efficiency. I don’t like wasting money on crappy designs.
I can tell you, customers get really unhappy when you show them the designed in limits, and compare it to systems without those limits (others to be sure, but better, saner designed limits). Nothing annoys someone as much as throwing good money away. When we hand them the tools you used, show them how to use them, and then they come back with an expanded set of tests, it makes us happy, as a little bit of good knowledge can be a dangerous thing … help free them from shackles that bind them to a particular solution.