Video interview: face melting performance in #hpc #nvme #storage @scalableinfo

Oh no … we didn’t say “face melting” … did we?

Oh. Yes. We. Did.

The interview is here at the always wonderful

You can see the video itself here on YouTube, but read Rich’s transcript. I was losing my voice, and he captured all of the interview in text.

Take home messages: Insane IO/Networking/processing performance, small footprint, tiny price, available for orders now.

Viewed 385 times by 187 viewers

There are no silver bullets, 2015 edition

In Feb 2013, I opined (with some measure of disgust) that people were looking at various software packages as silver bullets, these magical bits of a stack which could suddenly transform massive steaming piles of bits (big … uh … “data” ?) into golden nuggets of actionable data. Many of the “solutions” marketed these days are exactly like that … “add our magic bean software to your pipeline and you will gain insight faster.” or other such pablum.

This years nonsensical silver bullets all seem to center around various breathless exhortations on a few specific things.

First: Hardware doesn’t matter. We hear this repeated from customers repeating what their consulting folk say. The funny thing is, they are provably, demonstrably wrong. Hardware is an instance of an architecture, and architecture MATTERS VERY MUCH in terms of performance, performance density, control of costs, etc. If you have a single box that costs 1.2-1.4x what your other boxes cost, but does 2-4x the work load, do you save money if you assume a constant workload? Yes, of course you do. Does it give you an unfair competitive advantage over others believing this for use cases where performance and architecture actually matter (big data, storage, SDS/SDN/etc., high performance data analytics, etc.)? … Yes, yes it does.

Second: If we add in an NVMe/SSD/magical pixie dust, our old slow terrible architecture system will be as fast as your monster. Uh, no. Not even close. And I re-refer you to the afore-linked study showing our older slower (but superior) architecture as compared to a newer shiny-er set of parts with a weak architecture (the good box vs the cheap per unit cost box).

The second point usually gets exasperation from those whom can’t/shan’t/won’t believe that it is possible. I mean, how in all things that are good in this universe, could this little upstart bootstrapped company with no valley investors from … where … freaking Michigan? … build better stuff than a unicorn, or a massive hundred billion dollar value company? There is just no freaking way … its impossible.

Or, some think it is, until they try it. And realize that not only is it possible, but it happened. And is happening, and continues to happen.

I’ll talk about this more soon, but architecture matters. Far more than per unit cost. Because if you have a crappy architecture, you need many more of the low unit cost boxen to make up the performance difference. Doesn’t matter if they are yours, or in the cloud.

Something to ponder.

Viewed 8806 times by 607 viewers

The 1980s called and want their software licensing models back

So here I am, the day before thanksgiving, fighting a battle with a reluctant license server that wants to compute a hash of internal bits on a machine, in order to use to unlock a license key, to let software run.

This is not for us, but for a customer. At their site.

This is the same model from the 1980s and early 90s. Create a hash from a collection of things (or a dongle you attach to a serial/parallel port).

Tie the license hard to a particular machine.

This works “great” until something, anything at all, changes on the machine. And when I say anything, I mean ANYTHING.

New network driver that enumerates PCI bus functions differently than the old one? Sorry, you need a new license key.

New kernel? Yup, new license key.

PCI bus enumeration in a different order? Yuppers. New key.

CPU indexing (serial numbers available per CPU in each one) in a different order on startup? Absolutely need a new key.

This is so 1980s/1990s. And its completely broken.

Large customer can’t do work, because software licensing is so completely borked that it gets in the way of them doing their work.


I am not arguing for GPL here. I am arguing for sane design of software licensing. If you really want to tie it to a machine, make damned sure you can accept changes like, I dunno, out of order enumeration, or new kernels, or new drivers, or … without causing the customer grief before a major holiday when they have a helluvalotta work queued up to run on this.

Lower friction computing almost always wins. Make licensing lower friction, less brittle, more friendly, more forgiving of system “changes” which naturally occur these days in the era of virtualization and containerization.

/rant over

Viewed 8677 times by 586 viewers

A wonderful read on metrics, profiling, benchmarking

Brendan Gregg’s writings are always interesting and informative. I just saw a link on hacker news to a presentation he gave on “Broken Performance Tools“.

It is wonderful, and succinctly explains many thing I’ve talked about here and elsewhere, but it goes far beyond what I’ve grumbled over. One of my favorite points in there is slide 83. “Most popular benchmarks are flawed” and a pointer to a paper (easy to google for).

This, honestly, should be required reading for anyone looking at the performance analysis side of devops.

But most striking to me is slide 114, which I’ve discussed a corollary to in the past, the principle of maximum competence. Find the folks whom are smart enough to help you figure something out. Blame their bits. Watch them get all agitated and defensive, so that they are on a mission to find and solve that problem. Let them solve it. Iterate.

We’ve experienced that quite a bit.

Also, Brendan’s graphics are wonderful. Look at slide 120.

Slide 121 needs to be made visible to … laser engraved mebbe? … onto the brains of all the good folks that want to pull thousands of “metrics” per system, across thousands of systems, into a database for analysis/detection/..

Highly recommended reading.

Viewed 10158 times by 705 viewers

Massive Unapologetic Firepower part 3: Forte

Forte has uncloaked, website is being updated. You can email me ( for more info.

Pictures speak louder than words. Have a look.

That is 20+ GB/s for streaming sequential IO.

Then, 4kB random reads …

That is, 5+ Million IOPs.

Specs include

Price point for this is $50k for 48TB, $1/GB. Pre-order now, shipping in a few weeks.

Viewed 17087 times by 924 viewers

Shiny #HPC #storage things at #SC15

Assuming everything goes as planned (HA!) we should have a number of very cool things at SC15.

100Gb is awesome. The first time I ran an iperf bidirectional test, saw 20GB/s … it blew me away. 40/56GbE is old hat now, and 10GbE is in the rapidly receding past. If you aren’t at 40/56GbE or IB now, and planning for 100Gb you are definitely behind the times. If you are at 10 and look at 40Gb as a core switch tech … well … you should probably rethink that position.

Not only is it awesome, it is not expensive. Talk to us (booth 580) if you want to know more. I am not talking about perceived TCO value, I am talking acquisition cost.

Forte. You will hear more about this very soon. A teaser though: My test on a partial (literally a fractional) configuration in a customer usable config two days ago netted a sustained 2.4M IOPs. Full configuration will be available on show floor, and it represents the massive unapologetic firepower you’ve come to expect from us.

Combine that with the price points (will announce at show) … yeah, this is a very awesome system. You’ll see. We are taking pre-orders at the show. See the booth for details on the offer.

Viewed 28780 times by 1335 viewers

Moving inventory out to make room for new stuff

We have a bunch of units to move out. These are from a recent POC project, and we have a new architecture project that needs all that rack space and then some … the team are building Franken-boxen clients for this project, so we have enough requestors on the network.

Parts start arriving next week for that, and we really need to clear this out soon. I hate seeing good gear sitting idle on a storage shelf when it could be helping solve hard problems. Email me for info/quote. Fast sales (units are ready to ship).

And given that we are motivated to move it out, no reasonable offer refused.

Here’s the kit for sale:

  • 5x Unison all flash boxes with 32TB usable space (2DWPD), 2x 40GbE ports (can do 2-4x 10GbE and/or Infiniband), 128GB ram, 12 Haswell processor cores Intel E5-2620 v3 @ 2.40GHz. Each unit sustains more than 7GB/s writes, and 14GB/s reads, with measured IOP rates well north of 300k for 8k random read.
  • 1x Unison siFlash/Cadence with 16TB usable space (10DWPD), 2x 40GbE ports (can do 2-4x 10GbE and/or Infiniband), 128GB ram, 20 Haswell processor cores Intel E5-2687W v3 @ 3.10GHz. Each unit sustains more than 10GB/s writes, and 20+GB/s reads, with measured IOP rates well north of 600k for 8k random read.
  • 2x Unison spinning disk boxen with ~400TB usable space, 4×10 GbE ports (can do 2+ x 40GbE and/or Infiniband). Each box sustains ~6GB/s on streaming IO.

Again, feel free to email or call me (734 786 8423 x121). First come/first served.

Viewed 33714 times by 1449 viewers

Cat peeking out of bag: Schedule of presentations and talks in our booth for SC15 is up

I mentioned previously that we have some new (shiny) things … and it looks like you’ll be able to hear about them at my talk.

See the schedule for timing information.

This said, please note that we have a terrific line up of people giving talks:

    You may see me talking about Forte. Yes, you may see that. Quite likely.

    What is Forte?

    (big evil grin)

    I’ll save that for another post.

Viewed 35153 times by 1616 viewers

sios-metrics core rewritten

This was a long time coming. Something I needed to do, in order to build a far better code capable of using less network, less CPU power, and providing a better overall system.

In short, I ripped out the graphite bits and wrote a native interface to InfluxDB. This interface will also be adapted to kdb+ (32 bit edition), and graphite as time allows.

In the process, I cleaned up a tremendous amount of code. I removed lots of excess debugging bits. Fixed some very annoying problems.

I changed the way data is transmitted. Part of the reason I ripped out the graphite bits was that I felt that it encouraged a very suboptimal metric specification/transmission mechanism. Sure, there is a “pickling” version, but even that is highly inefficient. The mechanism I have now is far denser, though it is still not perfect. I’ve got a nice idea for an even denser mechanism (very easy to parse) that should work out nicely in the coming months, and will require only a slight change to the output code.

The output pathway had been quite fragile, and it was unable to easily cope with a server going offline for a bit. I’ve improved this some, but will do a better job on connecting to primary/secondary/tertiary servers in the coming months.

And, by the way, the configuration and plugin system is much better. Configuration is now for global system:

# config-file-type: JSON 1
   "global" : {
      "log_to_file" : "1",
      "log_name" : "/tmp/metrics-$system.log",
      "run_dir"  : "/dev/shm",
    "db" : {
	"default" : {
      			"host"    : "localhost",
      			"port"    : "8086",
      			"proto"   : "http",
			"db"	  : "unison"
	"second" : {
                        "host"    : "",
                        "port"    : "2003",
                        "proto"   : "tcp",
			"db"      : "fastpath"
   "metrics" : {
     "plugin_dirs" : ["plugins/"]

Note that you specify plug-in directories, and the code automagically searches for “.json” files in there, which handle the plugins. An example of such a file is this:

# config-file-type: JSON 1
  "metric": {
	      "enabled" 	: 1,
	      "command" 	: "plugins/",
	      "interval"	: 1,
	      "timeout" 	: 2,
	      "persistent" 	: 1,
              "xmit"       	: 1
  "alerts" : {
	"hot" : { 
		  "condition" : "_coretemp_ > 80.0",  
		  "message"   : "Warning: CPU temp greater than 80",
		  "severity"  : 5,
		  "action"    : ["alert"]

In this file, I specify a plugin code (can be in ANY language that can run on a machine), the sampling interval in seconds, the response timeout in seconds, whether or not the code is persistent (e.g. runs as a process and sends output to STDIO instead of being invoked each time it is used), and whether or not to transmit the results.

When I run the plugins (they must be able to run entirely on their own) I get something like this:

landman@lightning:~/work/development/sios-metrics$ plugins/ 

#### sync:1445922362
cputemp,core=0,machine=lightning,socket=0 coretemp=54.0
cputemp,core=1,machine=lightning,socket=0 coretemp=55.0
cputemp,core=2,machine=lightning,socket=0 coretemp=56.0
cputemp,core=3,machine=lightning,socket=0 coretemp=53.0

#### sync:1445922363
cputemp,core=0,machine=lightning,socket=0 coretemp=54.0
cputemp,core=1,machine=lightning,socket=0 coretemp=54.0
cputemp,core=2,machine=lightning,socket=0 coretemp=56.0
cputemp,core=3,machine=lightning,socket=0 coretemp=53.0

This is my laptop BTW. I have hooks in place to have the system respond to different signals (HUP to close and rotate logs, USR1 to reread configs).

Also, notice the “alert” section. This is coding in process, but the idea is to locally decide upon alerts at an appliance level. There are global/holistic issues, and local issues. Getting one system to handle/decide upon both is an exercise in futility. So local alerts will generate signals to the alerting system.

This is decidedly not a re-invention of a wheel. We have very different goals for this measurement, monitoring and alerting system than most of the others we’ve seen. And these will be unfolding and becoming more obvious over the next several months.

Once I get the rest of the json files constructed for the plugins, and rewriting the relevant plugins, I’ll update the public repo with a new branch/tags. More soon.

And for those really interested, I spent far too long trying to figure out why I wasn’t seeing output in some of the plugins. Turns out $| is very important when running in a subshell. Go figure.

Viewed 36339 times by 1543 viewers