Selling #HPC things on ebay

Given that the (now former) day job has ended, I am selling some of the old day job’s assets on ebay. We’ve sold some siFlash, Unison, and have current listings for Arista and Mellanox switches. More stuff will be listed in short order, check it out here. Feel free to reach out to me at joe.landman at the google mail thingy if you want to talk about any of these things, or buy before I list them. Literally everything must go, no reasonable offers rejected (on ebay or via email).

Viewed 131108 times by 8742 viewers

I always love these breathless stories of great speed, and how VCs love them …

Though, when I look at the “great speed”, it is often on par with or less than Scalable Informatics sustained years before.

From 2013 SC13 show, on the show floor, after blasting through a POC at unheard of speed, and setting long standing records in the STAC-M3 benchmarks …

Article in question is in the Register. Some of the speeds and feeds:

  • 200 microsecs latency
  • 45GBps read bandwidth
  • 15GBps write bandwidth
  • 7 million IOPS

But then … a fibre connection. And … its an array. Not an appliance. So deduct points.

Respectable read bandwidth, chances are they are doing this as reading compressed data, and then counting the uncompressed data as what was read, missing the decompression step. Write perf is low, should be higher. Would need more data on the IOPs to say one way or the other, how did they measure, etc.

FWIW, in 2012/2013, Scalable Informatics sustained 30+ GB/s read bandwidth on our siFlash unit for 128 threads of IO, and about 3M IOPs for 128 threads of random 8k reads. In 2015, we hit 24GB/s and 5M IOPs on v1 of Forte. v2 of Forte never saw the light of day because we ran out of money. Specs on that (estimated) are 50GB/s 10M IOPs in a 2U container.

And Scalable never got any VC love or cash. A shame, because we set very long standing records in a number of areas, that others are, to some degree, still catching up to, years later.

I’m reminded of a little bit of revisionist history put out by IBM at the time, that storage blogger Robin Harris recalled in his great StorageMojo blog.

Here is a paraphrase

They Say (random VC backed “performance” storage stealthy startup) Entry Into high performance storage Will Legitimize The Market.

The Bastards Say, Welcome.

Viewed 123706 times by 8609 viewers

pcilist: because sometimes you really, really need to know how your PCIe devices are configured

If you don’t know what I am talking about here, that’s fine. I’ll assume you don’t do hardware, or you call someone else when there is a hardware problem.

If you think “well gee, don’t we have lspci? so why do we need this?” then you probably have not really tried to use lspci to find this information, or didn’t know it was available.

Ok … what I am talking about.

When a PCIe bus comes up, the connections negotiate with the PCIe hub. The negotiate width (e.g. how many lanes of PCIe will they consume), speed (e.g. signalling speed in terms of GT/s), interrupts, etc. The PCIe hub presents this information to the OS, though in some cases, OSes like Linux choose to enumerate/walk the PCIe tree themselves … because … you know … BIOS bugs.

Ok. So these devices all autonegotiate as part of their initialization. Every now and then, you get a system where a card autonegotiates speeds or widths lower than expected. The driver generally provides the information on what it is capable of, and the PCIe hub, or OS structure, tells you the actual state.

Why is this important … you might rightly ask ?

So you have two machines. They connect over a simple network. The network speed is lower than the PCIe speed when the unit is operating at full capability. One simple estimate of the maximum possible speed of a PCIe system is take the number of GT/s and multiply it by the width. Divide that by 10. That is your approximate bandwidth in GB/s.

So your two machines have fast network cards (note this also works for HBAs … heck … everything … though … be careful about the power control systems, as they may mess with some of these things). You start using iperf to generate traffic between the two machines. And you see it is way below where you expect.

So, you start looking for why this is the case.

Latest drivers: check
Up to date kernel: check
Switch behaving well: check

Hmmm …. Something is amiss.

Then you try between other machines. Every other machine to the non-suspect machine is giving you reasonable numbers.

The suspect machine is giving you crappy numbers to/from it.

In the network scenario, you also see many errors/buffer overruns. Which means that the kernel can’t empty/fill buffers fast enough. Which suggests some odd speed issue.

Ok … where do you look, and what do you look for?

Pat yourself on the back if your hand shot up and you said, with confidence, ‘lspci’. Or parsing the /sys/… tree by hand. Either will work. Lets focus on lspci for the moment.

Ok, great. Now what information within lspci output do you want, and which options do you use?

They suggest -m or -mm for machine parseable output.

I am going to avoid those options. Try them, and see why for yourself.

You see, to get the juicy bits you need, you will need to give 3 v’s. -vvv . And to get a little more info, add a -kb to get driver and other info.

Now, look at that joyus output. Again, what info do you need?

Look at LnkCap: and LnkSta:

That’s what you need.

Wouldn’t it be nice if this were output in a nice simple, tabular form … so you could … I dunno … see your problem right away?

Well, your long wait is over! For only $19.95, and a quick trip to, you too can grab all this info incredibly quickly. Don’t believe me? Well then, have a gander:

landman@leela:~/work/development/pcilist$ sudo ./ 
PCIid   MaxWidth ActWidth MaxSpeed ActSpeed     driver       description
00:00.0        4        0        5                           Intel Corporation Haswell-E DMI2 (rev 02)
00:01.0        8        0        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 1 (rev 02) (prog-if 00 [Normal decode])
00:02.0        8        1        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])
00:02.2        8        8        8        8         pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])
00:03.0        8        0        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
00:03.2        8        8        8        5         pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
00:1c.0        1        0        5      2.5         pcieport Intel Corporation Wellsburg PCI Express Root Port #1 (rev d5) (prog-if 00 [Normal decode])
00:1c.4        4        1        5      2.5         pcieport Intel Corporation Wellsburg PCI Express Root Port #5 (rev d5) (prog-if 00 [Normal decode])
02:00.0        1        1      2.5      2.5    snd_hda_intel Creative Labs SB Recon3D (rev 01)
03:00.0        8        8        8        8          mpt3sas LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
05:00.0        8        8        5        5            ixgbe Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
05:00.1        8        8        5        5            ixgbe Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
07:00.0        1        1      2.5      2.5                  ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode])
80:02.0        8        8        8        8         pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])
80:03.0       16       16        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
82:00.0       16       16        8      2.5           nvidia NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2) (prog-if 00 [VGA controller])
82:00.1       16       16        8      2.5    snd_hda_intel NVIDIA Corporation Device 0fbc (rev a1)

Notice here how the NVidia card throttled down. When you start using it actively, it throttles up in speed.

But, if you have a nice 40GbE card, say an mlx4_en based card, and you see 5GT/s and x4 on the width, that gets you to about 2GB/s maximum. So you’ll see somewhat less than that on your network.

This is what I saw today. And I wanted to make it easy to spot going forward.

Viewed 128804 times by 8026 viewers


She's dead Jim!

This is the post an entrepreneur hopes to never write. They pour their energy, their time, their resources, their love into their baby. Trying to make her live, trying to make her grow.

And for a while, she seems to. Everything is hitting the right way, 12+ years of uninterrupted growth and profitable operation as an entirely bootstrapped company. Market leading … no … dominating … from the metrics customers tell you are important … position.

And then you get big enough to be visible. And you realize you need more capital to grow faster.

And you can’t get it, because, well … Detroit? 12 years of operations and you aren’t google sized?

Then you try to find capital. You go through many paths. One group complains that Pure storage is younger than you so you should be bigger than them, and you have a serious case of WTF. Another sets up what looks to be an awesome deal, until they demand all your IP for free forever, to do whatever they want with. Then another group … . VCs tell you things, you pitch them, and they can’t get to yes*

Well, lets just say this is the non-fun side of things.

Or when you go visit customers, they love your product, the benchmark results are mind blowing … and then the CIO/CFO kills the deal because, you know … you are too small.

And then, while you are talking to potential acquirers, your bank holding your LOC decides to blow you up. In the middle of the conversation. Literally.

Did I mention that this was the non-fun side of things?

You realize that while you may have been able to pull rabbits out of hats for the last 14 years, that you just hit a perfect storm. A veritable cartoon buzzsaw from which there really was no escape.

Your baby … the one you put so much energy, time, personal money into. Your baby was dying.

And there was little you could do to stop it.

There is a surreal quality to this. One not well captured in words. Not unlike an observer in a different temporal reference frame (e.g. you clock is running at a different speed), while all around you things are happening in real time.

There are the 5 stages of grief and coping.

You reflect on what you started, on what you contributed to. Small sampling would be defining and using the APU term for AMD, building tightly coupled computing and storage systems before hyperconvergence was a thing, designing accelerators and a whole market/sales plan before GPGPU was a thing, getting major customers signed up to a grid concept before there was a grid, and more than a year before there was AWS.

You reflect on your failures, or things you failed at. Raising capital in 2002-2006 to build accelerators. Raising capital in 2004-2006 to build a cloud. Raising capital in 2008 to build extreme density machines. Raising capital in 2011-2013 to build tightly coupled systems. Raising capital in 2014-2016 to build appliances.

You reflect on your successes. A small sampling: Taking a business from zero to multiple million dollar revenue without external investment (though I did try to get it). Setting very hard to beat benchmark numbers by designing/building the best architecture in market. Solving significant automation, support problems with better implementations, and skipping broken subsystems.

We’ve had people offer to “buy” us. For free. Come work for us, they said, and maybe, maybe we’ll pay you. Others have asked the amount on our LOC and tried to lowball even that.

I’ve spoken with companies that claim we have nothing of value to offer, then moments later offer to bring me on in “any capacity.” I’ve been told “name my position” many times. The mental disconnect required for people to make these contradictory statements is absolutely staggering.

You go through this “woulda, coulda, shoulda” thing. You think through all of it. How, if only, X had happened, then Y. You look for where you messed up. And realize that as hard as you are trying to blame yourself for this, you weren’t the only actor in the saga, that the other actors also had choices. You can argue that they made poor choices (and fundamentally, the number of phone calls you’ve fielded from people regretting the choices they’ve made which have involved buying “competitive” systems versus yours … this is the very definition of opportunity cost).

I’ve learned how to pull rabbits out of hats. I always knew there was a finite supply of rabbits and hats. And someday, I would not be able to do that anymore.
Going through this in your mind, you wonder, could you have found another rabbit to pull out of another hat.

And then you realize that you couldn’t have.

That day happened. The hats, and rabbits, were gone.

This has been a ride. A wild ride. A fun, fantastic, terrifying, gut wrenching, stomach turning, frustrating, frightening ride. This has been a humbling experience. I’ve learned a tremendous amount, about human nature, the nature of risk, the nature of competition, how people make decisions, and so forth. I’ve learned how hard you have to work to get customers to buy. How much effort you have to undertake. How much pain you have to absorb, smiling, while you deal with failures you have nothing to do with, but still fall … upon … you.

So here we are. At the end of the process. Life support has been removed. We are just waiting for the rest of nature to take its course.

Selling off the assets, IP, brands. joe.landman at the google mail thingy if you want to talk.

* A new Landman’s law is coming: “Any answer from a VC that is not yes, and not immediately followed by a wire transfer, is a no. If VCs tell you anything other than yes, it is a no. If VCs can’t make up their mind in less than a week, it is a no.”

Viewed 122556 times by 8208 viewers

Some updates coming soon

I should have something interesting to talk about over the next two weeks, though a summary of this is Scalable Informatics is undergoing a transformation. The exact form of this transformation is still being determined. In any case, I am no longer at Scalable.

Some items of note in recent weeks.

1) M&A: Nimble was purchased by HPE. Not sure of the specifics of “why”, other than HPE didn’t have much in this space. HPE also tends to destroy what it buys (either accidentally by rolling over it, ala ibrix, or on purpose).

Simplivity was also purchased by HPE. This gets them a HC stack. But … same comment as above.

Intel has been on an acquisition tear. Buying folks to try to get into driverless cars, defeat NVidia, etc.

2) scaling back: DSSD was killed. NVMe over Fabrics. SAN for NVMe. At first I thought this was a good idea. I no longer think this. SANs are generally terrible ideas, a 1990s/2000s architecture with a number of critical assumptions that have not been true for a while (at least a decade) now.

3) government budgets: Scaling back across the board, apart from defense. See this link for more. Some are yelling that the sky is falling. Others realize that this is not true, but more correctly understand that fiscal restraint now, while painful, will prevent far harder austerity in the future. Though there are those whom are turning this hard economic reality into a political fight (and they shouldn’t, but they are).

4) The economy is looking up in the US. This seems to be correlated with the election (which, I am still trying to understand). Real hiring, not the faux version we saw over the last 8 years, is on a tear. There are reductions in U6 (aka the real unemployment numbers). This is, surprising, but positive.

Viewed 62863 times by 5753 viewers

Best comment I’ve seen in a bug report about a tool

So … gnome-terminal has been my standard cli interface on linux GUIs for a while. I can’t bring myself to use KDE for any number of reasons. Gnome itself went in strange directions, so I’ve been using Cinnamon atop Mint and Debian 8.

Ok, Debian 8. Gnome-terminal. Some things missing when you right mouse button click. Like “open new tab”. Open new window is there. This works. But no tab entry. Its been there, like … forever.

What happened to it?

Google google google, and I find this page. This page has a comment on it, where the person commenting must have been dead-panning.

The only explanation for removing the menu item “File/Open Tab” is, that maintainers of gnome-terminal do not use Gnome terminal themselves. Also in 3.18.3 (Ubuntu 16.04) if I expand the terminal size to maximum (double click on the menu bar) it would not return back to its previous size when double clicked again.

Yeah … this may sum it up.

Looking at this, I get a sense of “lets remove something from the code base to simplify things. Wait, it was useful? Who uses this?”

I dunno … everybody?

Cinnamon is the fork of Gnome I use, as it is usable (as compared to the main branch). Most of my desktops are Linux Mint, but for a number of reasons, I need to have this environment be nearly identical to servers I build, hence the same base distro. Which also means I get to see some of the crazy changes people make to code, in the interests of being “better”. With often the exact opposite effect.

Serious #headpalm and #headdesk on this.

If you are going to change someone’s UI/UX/workflow, make sure it is a meaningful/useful change. Change for changes sake … is not good.

Viewed 56540 times by 4690 viewers

structure by indentation … grrrr ….

If you have to do this:

:%s/\t/    /g

in order to get a very simple function to compile because of this error

  File "./", line 13
    return sum
IndentationError: unindent does not match any outer indentation level

even though your editor (atom!!!!??!?!) wasn’t showing you these mixed tabs and spaces … Yeah, there is something profoundly wrong with the approach.

The function in question was all of 10 lines. The error was undetectable in the atom editor. Vim saves the day yet again, but … it … shouldn’t … have … to …

Viewed 54060 times by 4793 viewers

What is old, is new again

Way back in the pre-history of the internet (really DARPA-net/BITNET days), while dinosaur programming languages frolicked freely on servers with “modern” programming systems and data sets, there was a push to go from a static linking programs to a more modular dynamic linking. The thought processes were that it would save precious memory, not having many copies of libc statically linked in to binaries. It would reduce file sizes, as most of your code would be in libraries. It would encourage code reuse, which was (then) widely seen as a strong positive approach … one built applications by assembling interoperating modules with understandable APIs. You wrote glue logic to use them, and built ever better modules.

The arguments against this had to do with a mixture of API versioning (what if the function/method call changed between versions, or was somehow incompatible, or the API endpoint changed so much it went away … to security … well sort of. The argument, though not fully appreciated how powerful it was at the time, was that rogue code libraries could do nefarious things with these function calls, as there was no way to verify ahead of time, the veracity of the libraries, or the code calling them.

The latter point was prescient. And we’ve still not fully mapped out the nature of the exploits possible with LD_PRELOAD type attacks. You don’t need to hijack the source code, just change the library search path, injecting your own code ahead of the regular library code.

That is, your attack surface is now gigantic.


But for the moment, I’ll ignore that. I’ll simply focus on two aspects of static vs dynamic linking I find curious in this day and age.

First, a language growing in popularity within Google and a few other quarters is Go. Go offers, to some degree, a simplified programming model. Not nearly as much boilerplate as Java (yay!), and they make a number of things easier to deal with (multi-processing bits using channels, etc.). While it has a number of very interesting features, one of the aspects of go I find very interesting is that, by default, it appears to emit a statically linked binary … well … mostly statically linked. Very few external dependencies.

Here’s an example using code.

        # ldd minio (0x00007ffe605cc000) => /lib/x86_64-linux-gnu/ (0x00007f1227da3000) => /lib/x86_64-linux-gnu/ (0x00007f12279f8000)
	/lib64/ (0x000055e70fc40000)

This means, as long as the c, pthread, and ABI aspects don’t change, this code should be able to run on pretty much any linux kernel supporting x86-64 ABI.

Why is this important.

Very trivial application deployment.

No, really. Very trivial.

Ok, not all of this is due to Go’s propensity to be all-inclusive with a minimal footprint outside of itself. Some of this comes from thoughtful application design … you make environmental setup part and parcel of the application startup. If it doesn’t see an environment, it creates one (~/.minio). Which, with the next enhancement, makes it very easy to deploy fully functional, fully configured software appliances.

As a quick comparison, Minio’s purpose in life it to provide an S3 object store with significant AWS compatibility. And to make this drop dead simple. Compare this to a “similar” system (that I also like) in Ceph. Ceph provides many things, has a huge development group, has Red Hat behind it, and lots of vendor participation in their processes. Deployment of Ceph is anything but simple. There are a few efforts in play to try to make it simpler (see below), but nothing I’ve seen so far is either as good as Minio, or even working at this stage for that matter.

Again, to be fair (and I like/use both Ceph and Minio), Ceph is aiming for a bigger scope problem, and I expect greater complexity. Minio is a more recent development, and has exploited newer technologies to enable a faster bring-up.

The second aspect I find interesting in the static vs dynamic linking … are the way containers are built in Linux with Docker and variants. Here, instead of simply deploying a code within a zone, a partial installation of dependencies for that code is made in the container. This results in often, very large containers. There have been efforts to slim these down, using Alpine Linux, and various distros working on building minimal footprint base lines. This latter element, the minimal footprint baseline, runs a stark counter to the previous massive dependency radii that they’ve all grown to love over the last N years. In Red Hat/CentOS, you can’t install qemu/kvm without installing gluster. Even if you will never use gluster. Which means you have a whole lotta dead code in your installations you will never use in many cases.

This dead code is an attack surface. Having it there really …. really …. doesn’t help you, and means you have a much larger perimeter to defend.

But back to my point about containers … the way we are building containers in Linux is … effectively … statically linking application code with dynamic linking, to their .so/.dlls that they need to support this, and then defining the interface between the two namespaces. This is why the containers are so damned bulky. Because we are effectively statically linking the apps … after they have been dynamically linked.

Back to Minio for a moment. They provide an S3 compatible object stack, with erasure coding, replication, web management, etc. … many features of the Ceph object store … in this sized package:

        # ls -alFh minio 
        -r-x------ 1 root root 24M Feb 15 20:47 minio

Deployment is

# ./minio help
  Minio - Cloud Storage Server.

  Minio is an Amazon S3 compatible object storage server. Use it to store photos, videos, VMs, containers, log files, or any blob of data as objects.

  minio [FLAGS] COMMAND [ARGS...]

  server   Start object storage server.
  version  Print version.
  update   Check for a new software update.
  help, h  Shows a list of commands or help for one command
  --config-dir value, -C value  Path to configuration directory. (default: "/root/.minio")
  --quiet                       Disable startup information.
  --help, -h                    show help
  --version, -v                 print the version

Yeah. That’s not hard at all. Maybe there is something to this (nearly) static linking of applications.

Note by the way, there are a few Ceph-in-a-container projects running around, though honestly I am not anticipating that any of them will bear much fruit. There are also some “embedded” ceph projects, but similar comments there … I’ve played with many of them, and IMO the best way to run Ceph is as a stand-alone app with all the knobs exposed. Its complex to configure, but it works very well.

Minio targets a somewhat overlapping but different scenario, and one that is very intriguing.

Viewed 79339 times by 6748 viewers

That was fun: mysql update nuked remote access

Update your packages, they said.

It will be more secure, they said.

I guess it was. No network access to the databases.

Even after turning the database server instance to listen again on the right port, I had to go in and redo the passwords and privileges.

So yeah, this broke my MySQL instance for a few hours. Took longer to debug as it was late at night and I was sleepy, so I put it off until morning with caffeine.

I know containers are all the rage now (and I’ve been a proponent of that for a while), but this was a bare metal system running the database, with a bunch of VM based services (I want stronger isolation guarantees than I can get out of docker and related things on linux … I know I know, use SmartOS … planning to for some other stuff, as I have somewhat more time to play/learn/do ).

Still … surprises like this … not so good. Goes back to my theory that distributions should have as small an install as possible, with services offered as VMs and/or containers. So software updates can be trivially rolled back if and when … they break something.

I did this in the past at Scalable Informatics with the software defined appliances, where the entire OS image can be rolled forward/backwards with a simple reboot, as it was immutable. Really a distro needs to be this … the whole concept of a bare metal install should be one of absolutely minimal footprint, with hooks to enable modular services/functions/features.

Viewed 70695 times by 5764 viewers

An article on Rust language for astrophysical simulation

It is a short read, and you can find it on arxiv. They tackled an integration problem, basically using the code to perform a relatively simple trajectory calculation for a particular N-body problem.

A few things lept out at me during my read.

First, the example was fairly simplistic … a leapfrog integrator, and while it is a symplectic integrator, this particular algorithm not quite high enough order to capture all the features of the N-body interaction they were working on.

Second, the statically compiled Rust was (for the same problem/inputs) about equivalent to if not slighty faster than the Fortran code in performance, much faster than C++, and even more so than Go. This could be confirmation bias on my part, but I’ve not seen many places where Go was comparable in performance with C++, usually factors of 2 off or worse. There could be many reasons for this, up to, and including poor measurement scenarios, allowing the Go GC to run in critical sections, etc. I know C++ should be running near Fortran in performance, but generally, I’ve not seen that either as the general case. Usually it is a set of fixed cases.

The reason for Fortran’s general dominance comes from the 50+ years people have had to write better optimizers for it, and that the language is generally simpler, with fewer corner cases. That and the removal of dangerous things like common blocks and other global state. This said, I fully expect other codes to equal and surpass it soon on a regular basis. I have been expecting Julia to do this in fairly short order. I am heartened to see Rust appear to do this on this one test, though I personally reserve my opinion on this for now. I’d like to see more code do this.

Rust itself is somewhat harder to adapt to. You have to be more rigid about how you think about variables, how you will use them, and what mechanisms you can use to shuttle state around. You have to worry about this a bit more explicitly than other languages. I am personally quite intrigued by its promises: zero cost abstraction, etc. The unlikelihood of a SEGV hitting again is also quite nice … tracing those down can often be frustrating.

My concern is the rate at which Rust is currently evolving. I started looking at it around 0.10.x, and it is (as of this writing) up to 0.14.x with 0.15.x looming.

Generally, I want languages that get out of my way of doing things, not languages that smother me in boilerplate of marginal (at best) utility, and impose their (often highly opinionated, and highly targeted, often slightly askew) worldview on me. Anything with an explicit garbage collection blocking task which can interfere with calculation is a terrific example of this.

Simple syntax, ease of expression, ability to debug, high performance, accurate results, and freedom from crashing with (*&^*&^$%^&$ SEGVs are high on my list of languages I want to spend time with.

Viewed 54977 times by 5193 viewers