Scalable Informatics 13th year anniversary on Saturday

We started the company on 1-August-2002. I remember arguing with a senior VP at SGI over his decision to abandon linux clusters in Feb 2001. That was the catalyst for me leaving SGI, but I was too chicken to start Scalable then. I thought I could do better than them.

I went to another place for 15 months or so. Tried jumpstarting an HPC group there … hired lots of folks, pursued lots of business. Then it went bang. My team was laid off, and I was left with a serious case of Whiskey Tango Foxtrot.

Scalable Informatics was started in my basement. Our foundational thesis was and is that performance should be end user accessible without Herculean effort. You should have a fighting chance to be able to extract performance and leverage it. And it should be easy if at all possible.

This is what guided the company from the outset. Our path began with clusters, with a detour in 2002-2006 for accelerators. I called them APUs, Accelerated Processing Units. I wrote a bunch of white papers for AMD, and used APU within them, that AMD distributed widely. Now APU is a common term. Go figure.

We tried raising capital to build accelerators, believing (in hind sight, correctly) that they would be one of the most important aspects to high performance computing going forward. Couldn’t get any VC to bite. Even did due diligence for a few where we saw on others slide decks, a few of the slides (actual graphics we built) show up. That convinced me that VCs weren’t worth the time/effort to deal with.

Transitioned out of clusters once Dell decided it wanted that market. Very hard to compete with a massively parallel shipping machine that gets better pricing on parts than you can, and is willing to suck all the oxygen out of the room (or market) to suffocate others. We focused where we could add a great deal more value.

Hyperconverged systems. In 2006 onward.

Made small efforts to interest VCs … local groups … but not a whisper of interest.

Continued to set performance records with our units. Had competitors looking us over thinking they could build the same thing, discovering rapidly that there was indeed significant special sauce powering our kit.

Had a few acquisition offers in the mix. Ranging from “give us the company and we’ll decide if you are worth anything” on downward. It was actually quite humorous in some cases.

Kept getting better and bigger customers, building bigger and faster systems. Building lower cost systems, that while not the top of our line, easily bested our competitions top-of-their-line, at a fraction of the cost.

Ran a number of standard tests, never reported results. Had customers run tests “in anger” and report that jobs that normally took 6 hours on other gear took 5 minutes on ours. Another customer reported years ago that their 5GB/s system was being looked at by a flash vendor, curious as to whose flash they were using. Customer responded “no flash, just spinning disk”. Left the vendor speechless.

We’ve always said architecture matters. Its nice to be proven correct, again and again. Our competitors always seem to underestimate us. Please, by all means, continue to do this.

We kept adding people, growing out of my basement in 2007 to a real facility. Now on our second and we are bulging out of it. Actually blew out our AC, and have to get a new one.

Took our first external investment in Feb this year, and it looks like we are going to do some more pretty soon. Had another discussion that took 9 months and went down in flames over impossible for us to agree to terms. Exactly the sort of terms you bring up and insist cannot be compromised on, if you want to kill a deal. Kill it, they did.

Along the way, we set records on in-box firepower, and between box firepower. Records that are only recently coming under threat.

We’ve got some absolutely wild bits brewing in lab, things we can’t talk much about now, and its killing me … I really want to.

This said, our 13th year and beyond should be quite awesome. More soon. I promise.

Viewed 384 times by 213 viewers

Been there, done that, even have a patent on it

I just saw this about doing a divide and conquer approach to massive scale genomics calculation. While not specific to the code in question, it looked familiar. Yeah, I think I’ve seen something like this before … and wrote the code to do it.

It was called SGI GenomeCluster.

It was original and innovative at the time, hiding the massively parallel nature of the computation behind a comfortable interface that end users already knew. It divided the work up, queued up many runs, and reassembled output. In as much the same order as possible. One of my test matrices was taking the md5sum of output of my code and the original. If they differed, it failed.

There were many aspects of this that were (at the time, 1999-2000) quite novel. So we filed a patent on it. Which was granted. It is Patent number 7,249,357 if you care to look.

Next gen version avoiding all of the patented elements was developed at my next employer, whom subsequently had a financial meltdown due to a failed acquisition (or more correctly, failed due diligence during acquisition, so they didn’t uncover the slightly well done books in time). MSC.Life was lost to the ages.

I left there and started Scalable Informatics. 13 years ago this Saturday.

While the folks at Broad and Google seem to have done wonderful things, they may not have been the first to do this. I myself was inspired by the previous work of HT-BLAST from my colleagues at the time. Some whom insisted that there was no way a distributed version of this could ever scale … there were simply too many issues. I have great respect for them, but I set out to prove that it could scale. And scale it did.

Later on, a number of very smart folks at a number of places built mpiblast. I worked on helping to package it and automate builds of it.

Paraphrasing Newton, we’ve seen further because we stood on our predecessors shoulders, as they built the platforms that we could stand on.

This isn’t to minimize what was done. Sort of like the history of the “discovery” of the FFT. Seems to have been “discovered” a number of times. I find that amusing to some degree, but the history of scientific advancement is often composed of half forgotten and half remembered things. Quaternions anyone? Maxwell’s equations in Quaternion representation are a single equation. Not to mention their applicability to special relativity Lorentz transformations …

Viewed 419 times by 198 viewers

Build debugging thoughts

Our toolchain that we use for providing up to date and bug-reduced versions of various tools for our appliances have a number of internal testing suites. These suites do a pretty good job of exercising code. When you build Perl, and the internal modules and tools, tests are done right then and there, as part of the module installation.

Sadly not many languages do this yet, I think Julia, R, and a few others might. I’d like to see this as part of Python and other tools.

There is also a strange interaction between the gcc 4.7.2 and Perl 5.20.2. In this, if we use an optimization higher than none, one of the test cases fails.

This isn’t a perl issue per se, it works well with the 4.9.x compilers, and some others. I’ve not yet tried it with clang/LLVM, but should (if I ever get the time).

What I am thankful for are these code builds with the testing. I can see the failure, and have a good concept of what it is, not where it is. Had I more time, I’d see if I can work around the specific code that gcc 4.7.2 is mis-generating. But its easy to use -O0 for now, and not worry about it. I have bigger fish to fry.

I’ve had to work around some pretty insane compiler-language bugs in the past with all manner of interesting parsing errors that only showed up in specific compilation cases.

Since I drive my builds with a makefile, and capture all the output, its pretty easy to see what failed. I’ve been meaning to set up a Jenkins CI system in-house, and have even more async aspect to the process, but I find that being able to see the builds in real time sometimes helps me.

So I let them crank off on the side in a window, while I work on other stuff. That way I can get my iterative work done, while remaining quite productive.

[Update] I tried clang/LLVM and it worked (and was very fast). But the issue for the moment is the size of the ramdisk for the appliance, and adding another compiler runtime toolchain is going to make this larger. So this will take more time to correctly study.

I did note that 5.22.0 was released, so I grabbed that. Seems to build/test properly, and I am not getting the errors I was getting with 5.20.2. Sort of a meta-debugging … I am not digging into why I was getting the errors, bumping to 5.22.0 looks like it solves the build problem.

Viewed 2051 times by 370 viewers

Insanely awesome project and product

This is one of Scalable Informatics FastPath Unison systems, well the bottom part. The top are clients we are using to test with.



Each of the servers at the bottom is a 4U with 54 physical 2.5 inch 6g/12g SAS or SATA SSDs. We have 5 of these units in the picture. And a number of SSDs on the way to fill them up. Think 0.2PB usable of flash. Distributed in a very nice parallel file system we work quite a bit with.

The network (not shown, ignore the cat6 spaghetti on the sides … need to talk with the team about this) should be some bloody fast stuff that lets us drive the servers at or near their theoretical max bandwidth … that is, its very well matched to these units.

More soon. This is just insanely exciting stuff. Capability class to an insane degree.

Viewed 7806 times by 611 viewers

Playing “guess which wire I just pulled” isn’t fun

Even less fun when the boxes are half a world away.

Yeah, this was my weekend and a large chunk of today.

This will segue into another post on design and (unintended) changes in design, and end user expectations at some point. Its hard to maintain a concept of an SLO if some of the underlying technology you are relying upon to deliver these objectives (like, I dunno, a wire?), suddenly disappears on you. Or even more interestingly, when someone needs something (also like a wire), sees it connected to your box, and decides to take it.

There is a reason we do what we do, and a reason we do it the way we do it. I am (continually) blown away by the “but you don’t need X here, we’ll provide it for you”, as when we get there, we discover that no, they really can’t provide it for us, and yes, the system design requires that to function.

This is when, to steal what a customer once opined here, we resort to cowboy engineering. Or to put it another way, when you are up to your ass in alligators, its sometimes difficult to remember that the objective is to drain the swamp. But success is defined only in terms of draining the swamp, not the number of alligators you have to overcome. Sometimes (ok, often) the alligators are self-imposed … and that’s even more exacerbating.

There is a reason we do what we do, and why we do it the way we do it. Its not to sell more kit. Its to deliver functional extreme performance, and manageable systems.

Off to class now … need a break.

Viewed 8382 times by 618 viewers

M&A fallout: Cisco may have ditched Invicta after buying Whiptail

Article is here, take it as a rumor until we hear from them.

My comments:

First, M&A is hard. You need a good fit product wise (little overlap and great complementary functions/capabilities), and a culture/people fit matter.

Second, sales teams need to be on-board selling complete solutions involving the acquired tech. Sometimes this doesn’t happen, for any number of reasons, some fixable, some not.

Third, Cisco is out of the storage game if this is true.

Its worth noting that this perfectly illustrates one of the points we make when we are on a sales call and are told that a customer wants to buy the “safe” choice, from a known brand name. Safe? Really? If you have business dependencies upon this, how is this choice safe? Because of the brand name? How’s that Sun gear holding up? etc.

We hear this less and less these days, with people realizing that value comes in many size company packages, and risk is much more than brand. If your entire company is hit by a bus, can your customers keep working with their kit and hire others to support it?

For us, the answer is a resounding and unqualified “YES“, and we state it quite succinctly … Bricking not included. Worth considering before you plunk down money for things … can you support it if the supplier is reorg’ed out of existence. In many cases, the answer is no, for gear/processes that are not open.

Viewed 16779 times by 791 viewers

On storage unicorns and their likely survival or implosion

The Register has a great article on storage unicorns. Unicorns are not necessarily mythical creatures in this context, but very high valuation companies that appear to defy “standard” valuation norms, and hold onto their private status longer than those in the past. That is, they aren’t in a rush to IPO or get acquired.

Comment A venture-capitalist-tracking website has revealed a list of unicorns, which are startups valued at a billion dollars or more. Eight storage companies are in the list; does this mean a glorious outcome for them?

The article goes on to analyze the “storage” unicorns, those in the “storage” field. They admix storage, nosql, hyperconverged, and storage as a service. This is my main criticism of the article, as I would define only 2 of the entries here as storage companies. For the rest, storage is a byproduct of what it is they do.

And that is about all the criticism I’ll level at this article, as the rest of it is pretty close to dead on correct.

Basically the article goes on to (succinctly) analyze the competitive nature of business, the companies offerings, and their real traction (if available). Its not comprehensive, its not in-depth, but its a good first pass at an analysis document, that one might like to expand upon.

In the article, they note that storage-as-a-service, the Hadoop, and the storage appliance companies may not necessarily have staying power. This said, perception is reality in the market, so Pure storage is a success by that measure, regardless of its actual data … that is … until IPO. Then its all about the “what have you done this quarter, this year”, etc.

More interestingly, they rate the hyperconverged systems highly, with noSQL and the data reduction tech also in the mix.

Hyperconverged systems are what the day job is all about. We build the fastest, densest units in market, and have a number of very cool things coming to complement this. Its a different side of the same market than the other hyperconverged players (they are focusing upon VDI and VMs), but … its converging … as you can’t really build very dense systems upon poorly designed and performing kit. This is where our incredible firepower comes into play. Density is a function of how much performance you can leverage per rack U, per watt, per port.

Viewed 16667 times by 798 viewers

Tools for linux devops: lsbond.pl

Slowly and surely, I am scratching the itches I’ve had for a while with regards to data extraction from a running system. One of the big issues I deal with all the time is to extract what the state and components (and their states) of a linux network bond. Its an annoying combination of /sys/class/net, /proc/net/bonding/, and ethtool/ip commands. So I decided to simplify it.

bond0:	mac 00:11:22:33:44:55
	state   up
	mode load balancing (xor)
	xmit_hash layer2+3 (2)
	polling 100 ms
	up_delay 200 ms
	down_delay 200 ms
  ipv4 10.20.30.40/16 
  ipv6 fe80::123:00ff:fe80:4455/64 
  eth2: mac 00:11:22:33:44:55, link 1, state   up, speed 10000, driver ixgbe
  eth3: mac 00:11:22:33:44:56, link 0, state down, speed 65535, driver ixgbe
  eth4: mac 00:11:22:33:44:57, link 1, state   up, speed 10000, driver ixgbe
  eth5: mac 00:11:22:33:44:58, link 0, state down, speed 65535, driver ixgbe

Eventually, I’ll provide csv and json output modules. But this is the first of many. Look at/grab the source on github.

Viewed 23550 times by 895 viewers

Day job growing

We brought on a new business development and sales manager today. Actually based in Michigan. Looking forward to great things from him, and we are all pretty excited!

Viewed 24457 times by 1050 viewers

Gmail lossy email system

For months I’ve been noting that email to my 2 different GMail accounts (one for work on the business side using the Google Apps for business, and yes, paid for … and one for personal) are not getting all the emails sent to it.

I’ve had customers reach out to me here at this site, as well as calling me up to ask me if I’ve been getting their email. Seems I’m not the only one, though the complaint here appears to be a bad filter and characterizing system. My complaint is emails that do not make it through.

Back in the day, when I used to run my own email servers for the office, I didn’t normally have a problem with mail. We had a deeply pipelined spam/virus filter that could handle pretty much anything we or others threw at it. Even this site has that same tech, and we’ve used it for years. Handled attacks just fine. We wouldn’t lose mail. That is something that is simply not acceptable.

You can mischaracterize it. False negative/positive on spam. But losing it? Never. Unacceptable.

So why is GMail losing so much of my mail? I don’t have a transmission or loss probability right now, but I am thinking, seriously, of automating a set of tests to see what fraction get through.

No, the emails don’t wind up in SPAM. They don’t wind up anywhere.

Very frustrating. And I don’t like black boxes I can’t look into.

Viewed 18223 times by 852 viewers