One door has closed, another has opened

As I had written previously, my old company, Scalable Informatics, has closed. Read that posting to see why and how, but as with all things … we must move forward.

It is cliche’ to use the title phrase. But it is also true. We know the door that closed. It’s the door that has opened afterwards that I am focusing upon.

I have joined Joyent to work on, as it turns out, many similar things to what I did at Scalable. Building and supporting awesome hardware, helping provide operational capability to end users for the platform, working with partners and vendors to bring value to the Joyent public cloud, and many other things.

Due to the nature of what I’ll be working on, there is a very strong HPC component to it, though I don’t think the team is using the HPC word quite yet … that is fine. I’ve not left the market, I’ve not redefined the market so I wouldn’t leave it.

Joyent, for those who don’t know, have a set of very interesting product offerings, and a tremendous team around them. As a company, Joyent was a startup, recently acquired by Samsung Mobile. This provides part of what I craved, stability.

As I had noted in my previous posts, I was getting tired of running as hard as possible and then head first into challenges that had nothing whatsoever to do with technical merit, deal economics, etc. but had everything to do with … well … useless and unreasonable measures of perception and stability.

I don’t have those issues here.

Moreover, I have a strong sense of where I can contribute to the success of the team, the mission objectives, etc.

From a technical scenario, the approach to building a data center as a container engine/platform is absolutely brilliant. Your deployment winds up being trivial. Your performance, since you are running on bare metal, is tremendous.

That is one of the things that really drew me to Joyent years ago. There’s a right way to do containers/VMs and ways that aren’t right. Joyent does these right.

No system is perfect, all have challenges of one sort or the other. I am well aware of the challenges for the Triton platform, but I am simultaneously, far less concerned about them, as compared to other platforms.

In the lead up to Joyent, I spoke to many other groups about positions, and I’ll have a writeup on that later (without naming names, just talking about the process and some of the odd things that came up).

For an old HPC hand like me, I was at least a little amused by some of the other groups insistence that you didn’t need feature X for HPC and HPC like things … when the HPC world had pretty clearly spoken over the last decade in demanding feature X.

The position boiled down to “well, we don’t have it, so no one needs it.” Which, if you look at this, was marketing drivel being repeated by engineering resources.

Contrast to to Joyent. Who detailed for me “we are here” and “this is where we need to be”. With a very well understood path between the two locations. There’s no “we know better”. There was quite a bit of “what do we have to do to get better, and be better.”

This is what I like. The team is phenomenal. Really. The tech … I’ve gushed on about it in the past in these pages … but its very good. There’s a few tech challenges, but the cool thing is that I get to help make those less of a concern by helping to find solutions to them. Which I’ve done once before for this platform. And now I have a wider remit, and don’t have to worry myself with making my monthly numbers to pay the team.

You see why I like this yet?

Add to this, some of the needs from the parent company. Just like in the original Jurassic park

… I know this stuff. I lived and breathed the stuff they need for most of my professional life.

And did I mention the awesome team? Great products? Incredible market and market ops?

It took a friend to reconnect me with the team. What was supposed to be a 30 minute call wound up taking all afternoon. I was positively giddy after it.

And I am now onboard. Running as hard as I can to catch up and not slow them down. Hopefully to help them accelerate, even harder and faster than before.

This is a good thing, stay tuned!

Note: I won’t be able to blog much about internal things (as I had hinted at, at Scalable). But I will likely drop hints on things that I am allowed to talk about, or that people might wish to pay attention to.

And remember that I’ve not left HPC, nor the market behind. This role encompasses everything I’ve done before … plus quite a bit more stuff.

This is good. Very good!

Viewed 43153 times by 3561 viewers

Hard disk shipments dropped 10% QoQ, 2% YoY

This jives very well with what I’ve observed. Decreasing demand for enterprise storage hard disks, or as I call them “Spinning Rust Drives” (or SRD) as compared with SSD (Solid State Drives).

The summary is here with a key quote being

3.5″/2.5″ Enterprise HDDs: For a second consecutive quarter, total enterprise HDD sales declined. Both performance enterprise and nearline HDD volumes were down for the quarter, as
total unit shipments fell to approximately 16-17 million.

Again, jives well with what I’ve observed.

Mellanox has a good take on its blog, noting that

Flash keeps taking over

Every year, for the past four years, has been ?The Year Flash Takes Over? and every year flash owns a growing minority of storage capacity and spend, but it?s still in the minority. 2017 is not the year flash surpasses disk in spending or capacity ? there?s simply not enough NAND fab capacity yet, but it is the year all-flash arrays go mainstream. SSDs are now growing in capacity faster than HDDs (15TB SSD recently announced) and every storage vendor offers an all-flash flavor. New forms of 3D NAND are lowering price/TB on one side to compete with high capacity disks while persistent memory technologies like 3D-XPoint (while not actually buillt on NAND flash) are increasing SSD performance even further above that of disk. HDDs will still dominate low price, high-capacity storage for some years, but are rapidly becoming a niche technology.

This is a critical point. While SRD are dropping in volume, there is not enough SSD fab capacity to supply the market demand. Which, curiously, means that economics 101 is strong and we see a healthy supply/demand curve in play. Moreover, entry into this market (e.g. building fabs for NAND) is prohibitively expensive. This means that the SSD supply will be constrained for the foreseeable future.

This also opens the way up to other mass producable technologies. Given the contenders to replace NAND for SSD, I’d argue that the combination of the least expensive and most easily licensed/reproduced system (which might be NAND) will be the high growth bit for a while.

Apart from that fab entry cost.

Without reading the tea leaves too hard, it is pretty clear that SRD are headed in the direction of archives, and colder storage. Tape is still around for now, though, I am not sure if anyone will be looking at it in 2-5 years time frame … serialized storage technologies that don’t do reasonable jobs on seeky loads might not fare well.

Possibly SSD/Tape hybrids, with the SSD providing not really a cache per se, but a fractional capacity tier, and many parallel tape drives providing some semblence of many heads on a disk … but the issue then is that seek times to new “sectors” takes 102 to 103 seconds, as compared to a hard disk, where seeking is around 10-2 seconds. Thats 4 to 5 orders of magnitude difference, and for active archives, I can’t imagine a potentially non-serial workload being deployed on such a device.

Even backups … you have the storage bandwidth wall (SBW) problem, where you have a small pipe bandwidth relative to your capacity. SBW measures time to move an amount of data.

$SBW = Capacity / Bandwidth$

1PB of data, at 100 MB/s (about the real writing rate of tapes) is about 107 seconds, or 1/3 of a year. Do you really want that for your read/write cycle? Even 10 units running in parallel for 1GB/s is 1/30 of a year, or about 12 days. And your data keeps growing.

No, tape has a limited lifetime going forward IMO. Pipe size (the B) is one of the major issues, seek rate being the other. Low cost doesn’t matter if you can’t get your data off of it fast enough when you need to.

What I think I see going forward is people basically “sloshing” their data between storage systems, with HDD playing a large role going forward. Larger denser HDD could make integrating archive and backup fairly simple into an all flash storage system. Not as tiers (data motion is your enemy, as it reduces the usable value of Bandwidth in the SBW equation).

So I expect a long tail on the HDD. They won’t go away any time soon. Well, not all of them. 10k and 15k RPM drives are probably done for.

Viewed 46635 times by 3409 viewers

Selling #HPC things on ebay

Given that the (now former) day job has ended, I am selling some of the old day job’s assets on ebay. We’ve sold some siFlash, Unison, and have current listings for Arista and Mellanox switches. More stuff will be listed in short order, check it out here. Feel free to reach out to me at joe.landman at the google mail thingy if you want to talk about any of these things, or buy before I list them. Literally everything must go, no reasonable offers rejected (on ebay or via email).

Viewed 53204 times by 3646 viewers

I always love these breathless stories of great speed, and how VCs love them …

Though, when I look at the “great speed”, it is often on par with or less than Scalable Informatics sustained years before.

From 2013 SC13 show, on the show floor, after blasting through a POC at unheard of speed, and setting long standing records in the STAC-M3 benchmarks …

Article in question is in the Register. Some of the speeds and feeds:

  • 200 microsecs latency
  • 45GBps read bandwidth
  • 15GBps write bandwidth
  • 7 million IOPS

But then … a fibre connection. And … its an array. Not an appliance. So deduct points.

Respectable read bandwidth, chances are they are doing this as reading compressed data, and then counting the uncompressed data as what was read, missing the decompression step. Write perf is low, should be higher. Would need more data on the IOPs to say one way or the other, how did they measure, etc.

FWIW, in 2012/2013, Scalable Informatics sustained 30+ GB/s read bandwidth on our siFlash unit for 128 threads of IO, and about 3M IOPs for 128 threads of random 8k reads. In 2015, we hit 24GB/s and 5M IOPs on v1 of Forte. v2 of Forte never saw the light of day because we ran out of money. Specs on that (estimated) are 50GB/s 10M IOPs in a 2U container.

And Scalable never got any VC love or cash. A shame, because we set very long standing records in a number of areas, that others are, to some degree, still catching up to, years later.

I’m reminded of a little bit of revisionist history put out by IBM at the time, that storage blogger Robin Harris recalled in his great StorageMojo blog.

Here is a paraphrase

They Say (random VC backed “performance” storage stealthy startup) Entry Into high performance storage Will Legitimize The Market.

The Bastards Say, Welcome.

Viewed 55249 times by 3808 viewers

pcilist: because sometimes you really, really need to know how your PCIe devices are configured

If you don’t know what I am talking about here, that’s fine. I’ll assume you don’t do hardware, or you call someone else when there is a hardware problem.

If you think “well gee, don’t we have lspci? so why do we need this?” then you probably have not really tried to use lspci to find this information, or didn’t know it was available.

Ok … what I am talking about.

When a PCIe bus comes up, the connections negotiate with the PCIe hub. The negotiate width (e.g. how many lanes of PCIe will they consume), speed (e.g. signalling speed in terms of GT/s), interrupts, etc. The PCIe hub presents this information to the OS, though in some cases, OSes like Linux choose to enumerate/walk the PCIe tree themselves … because … you know … BIOS bugs.

Ok. So these devices all autonegotiate as part of their initialization. Every now and then, you get a system where a card autonegotiates speeds or widths lower than expected. The driver generally provides the information on what it is capable of, and the PCIe hub, or OS structure, tells you the actual state.

Why is this important … you might rightly ask ?

So you have two machines. They connect over a simple network. The network speed is lower than the PCIe speed when the unit is operating at full capability. One simple estimate of the maximum possible speed of a PCIe system is take the number of GT/s and multiply it by the width. Divide that by 10. That is your approximate bandwidth in GB/s.

So your two machines have fast network cards (note this also works for HBAs … heck … everything … though … be careful about the power control systems, as they may mess with some of these things). You start using iperf to generate traffic between the two machines. And you see it is way below where you expect.

So, you start looking for why this is the case.

Latest drivers: check
Up to date kernel: check
Switch behaving well: check

Hmmm …. Something is amiss.

Then you try between other machines. Every other machine to the non-suspect machine is giving you reasonable numbers.

The suspect machine is giving you crappy numbers to/from it.

In the network scenario, you also see many errors/buffer overruns. Which means that the kernel can’t empty/fill buffers fast enough. Which suggests some odd speed issue.

Ok … where do you look, and what do you look for?

Pat yourself on the back if your hand shot up and you said, with confidence, ‘lspci’. Or parsing the /sys/… tree by hand. Either will work. Lets focus on lspci for the moment.

Ok, great. Now what information within lspci output do you want, and which options do you use?

They suggest -m or -mm for machine parseable output.

I am going to avoid those options. Try them, and see why for yourself.

You see, to get the juicy bits you need, you will need to give 3 v’s. -vvv . And to get a little more info, add a -kb to get driver and other info.

Now, look at that joyus output. Again, what info do you need?

Look at LnkCap: and LnkSta:

That’s what you need.

Wouldn’t it be nice if this were output in a nice simple, tabular form … so you could … I dunno … see your problem right away?

Well, your long wait is over! For only $19.95, and a quick trip to, you too can grab all this info incredibly quickly. Don’t believe me? Well then, have a gander:

landman@leela:~/work/development/pcilist$ sudo ./ 
PCIid   MaxWidth ActWidth MaxSpeed ActSpeed     driver       description
00:00.0        4        0        5                           Intel Corporation Haswell-E DMI2 (rev 02)
00:01.0        8        0        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 1 (rev 02) (prog-if 00 [Normal decode])
00:02.0        8        1        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])
00:02.2        8        8        8        8         pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])
00:03.0        8        0        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
00:03.2        8        8        8        5         pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
00:1c.0        1        0        5      2.5         pcieport Intel Corporation Wellsburg PCI Express Root Port #1 (rev d5) (prog-if 00 [Normal decode])
00:1c.4        4        1        5      2.5         pcieport Intel Corporation Wellsburg PCI Express Root Port #5 (rev d5) (prog-if 00 [Normal decode])
02:00.0        1        1      2.5      2.5    snd_hda_intel Creative Labs SB Recon3D (rev 01)
03:00.0        8        8        8        8          mpt3sas LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
05:00.0        8        8        5        5            ixgbe Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
05:00.1        8        8        5        5            ixgbe Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
07:00.0        1        1      2.5      2.5                  ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode])
80:02.0        8        8        8        8         pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])
80:03.0       16       16        8      2.5         pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
82:00.0       16       16        8      2.5           nvidia NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2) (prog-if 00 [VGA controller])
82:00.1       16       16        8      2.5    snd_hda_intel NVIDIA Corporation Device 0fbc (rev a1)

Notice here how the NVidia card throttled down. When you start using it actively, it throttles up in speed.

But, if you have a nice 40GbE card, say an mlx4_en based card, and you see 5GT/s and x4 on the width, that gets you to about 2GB/s maximum. So you’ll see somewhat less than that on your network.

This is what I saw today. And I wanted to make it easy to spot going forward.

Viewed 62583 times by 4325 viewers


She's dead Jim!

This is the post an entrepreneur hopes to never write. They pour their energy, their time, their resources, their love into their baby. Trying to make her live, trying to make her grow.

And for a while, she seems to. Everything is hitting the right way, 12+ years of uninterrupted growth and profitable operation as an entirely bootstrapped company. Market leading … no … dominating … from the metrics customers tell you are important … position.

And then you get big enough to be visible. And you realize you need more capital to grow faster.

And you can’t get it, because, well … Detroit? 12 years of operations and you aren’t google sized?

Then you try to find capital. You go through many paths. One group complains that Pure storage is younger than you so you should be bigger than them, and you have a serious case of WTF. Another sets up what looks to be an awesome deal, until they demand all your IP for free forever, to do whatever they want with. Then another group … . VCs tell you things, you pitch them, and they can’t get to yes*

Well, lets just say this is the non-fun side of things.

Or when you go visit customers, they love your product, the benchmark results are mind blowing … and then the CIO/CFO kills the deal because, you know … you are too small.

And then, while you are talking to potential acquirers, your bank holding your LOC decides to blow you up. In the middle of the conversation. Literally.

Did I mention that this was the non-fun side of things?

You realize that while you may have been able to pull rabbits out of hats for the last 14 years, that you just hit a perfect storm. A veritable cartoon buzzsaw from which there really was no escape.

Your baby … the one you put so much energy, time, personal money into. Your baby was dying.

And there was little you could do to stop it.

There is a surreal quality to this. One not well captured in words. Not unlike an observer in a different temporal reference frame (e.g. you clock is running at a different speed), while all around you things are happening in real time.

There are the 5 stages of grief and coping.

You reflect on what you started, on what you contributed to. Small sampling would be defining and using the APU term for AMD, building tightly coupled computing and storage systems before hyperconvergence was a thing, designing accelerators and a whole market/sales plan before GPGPU was a thing, getting major customers signed up to a grid concept before there was a grid, and more than a year before there was AWS.

You reflect on your failures, or things you failed at. Raising capital in 2002-2006 to build accelerators. Raising capital in 2004-2006 to build a cloud. Raising capital in 2008 to build extreme density machines. Raising capital in 2011-2013 to build tightly coupled systems. Raising capital in 2014-2016 to build appliances.

You reflect on your successes. A small sampling: Taking a business from zero to multiple million dollar revenue without external investment (though I did try to get it). Setting very hard to beat benchmark numbers by designing/building the best architecture in market. Solving significant automation, support problems with better implementations, and skipping broken subsystems.

We’ve had people offer to “buy” us. For free. Come work for us, they said, and maybe, maybe we’ll pay you. Others have asked the amount on our LOC and tried to lowball even that.

I’ve spoken with companies that claim we have nothing of value to offer, then moments later offer to bring me on in “any capacity.” I’ve been told “name my position” many times. The mental disconnect required for people to make these contradictory statements is absolutely staggering.

You go through this “woulda, coulda, shoulda” thing. You think through all of it. How, if only, X had happened, then Y. You look for where you messed up. And realize that as hard as you are trying to blame yourself for this, you weren’t the only actor in the saga, that the other actors also had choices. You can argue that they made poor choices (and fundamentally, the number of phone calls you’ve fielded from people regretting the choices they’ve made which have involved buying “competitive” systems versus yours … this is the very definition of opportunity cost).

I’ve learned how to pull rabbits out of hats. I always knew there was a finite supply of rabbits and hats. And someday, I would not be able to do that anymore.
Going through this in your mind, you wonder, could you have found another rabbit to pull out of another hat.

And then you realize that you couldn’t have.

That day happened. The hats, and rabbits, were gone.

This has been a ride. A wild ride. A fun, fantastic, terrifying, gut wrenching, stomach turning, frustrating, frightening ride. This has been a humbling experience. I’ve learned a tremendous amount, about human nature, the nature of risk, the nature of competition, how people make decisions, and so forth. I’ve learned how hard you have to work to get customers to buy. How much effort you have to undertake. How much pain you have to absorb, smiling, while you deal with failures you have nothing to do with, but still fall … upon … you.

So here we are. At the end of the process. Life support has been removed. We are just waiting for the rest of nature to take its course.

Selling off the assets, IP, brands. joe.landman at the google mail thingy if you want to talk.

* A new Landman’s law is coming: “Any answer from a VC that is not yes, and not immediately followed by a wire transfer, is a no. If VCs tell you anything other than yes, it is a no. If VCs can’t make up their mind in less than a week, it is a no.”

Viewed 70672 times by 5532 viewers

Some updates coming soon

I should have something interesting to talk about over the next two weeks, though a summary of this is Scalable Informatics is undergoing a transformation. The exact form of this transformation is still being determined. In any case, I am no longer at Scalable.

Some items of note in recent weeks.

1) M&A: Nimble was purchased by HPE. Not sure of the specifics of “why”, other than HPE didn’t have much in this space. HPE also tends to destroy what it buys (either accidentally by rolling over it, ala ibrix, or on purpose).

Simplivity was also purchased by HPE. This gets them a HC stack. But … same comment as above.

Intel has been on an acquisition tear. Buying folks to try to get into driverless cars, defeat NVidia, etc.

2) scaling back: DSSD was killed. NVMe over Fabrics. SAN for NVMe. At first I thought this was a good idea. I no longer think this. SANs are generally terrible ideas, a 1990s/2000s architecture with a number of critical assumptions that have not been true for a while (at least a decade) now.

3) government budgets: Scaling back across the board, apart from defense. See this link for more. Some are yelling that the sky is falling. Others realize that this is not true, but more correctly understand that fiscal restraint now, while painful, will prevent far harder austerity in the future. Though there are those whom are turning this hard economic reality into a political fight (and they shouldn’t, but they are).

4) The economy is looking up in the US. This seems to be correlated with the election (which, I am still trying to understand). Real hiring, not the faux version we saw over the last 8 years, is on a tear. There are reductions in U6 (aka the real unemployment numbers). This is, surprising, but positive.

Viewed 36816 times by 3182 viewers

Best comment I’ve seen in a bug report about a tool

So … gnome-terminal has been my standard cli interface on linux GUIs for a while. I can’t bring myself to use KDE for any number of reasons. Gnome itself went in strange directions, so I’ve been using Cinnamon atop Mint and Debian 8.

Ok, Debian 8. Gnome-terminal. Some things missing when you right mouse button click. Like “open new tab”. Open new window is there. This works. But no tab entry. Its been there, like … forever.

What happened to it?

Google google google, and I find this page. This page has a comment on it, where the person commenting must have been dead-panning.

The only explanation for removing the menu item “File/Open Tab” is, that maintainers of gnome-terminal do not use Gnome terminal themselves. Also in 3.18.3 (Ubuntu 16.04) if I expand the terminal size to maximum (double click on the menu bar) it would not return back to its previous size when double clicked again.

Yeah … this may sum it up.

Looking at this, I get a sense of “lets remove something from the code base to simplify things. Wait, it was useful? Who uses this?”

I dunno … everybody?

Cinnamon is the fork of Gnome I use, as it is usable (as compared to the main branch). Most of my desktops are Linux Mint, but for a number of reasons, I need to have this environment be nearly identical to servers I build, hence the same base distro. Which also means I get to see some of the crazy changes people make to code, in the interests of being “better”. With often the exact opposite effect.

Serious #headpalm and #headdesk on this.

If you are going to change someone’s UI/UX/workflow, make sure it is a meaningful/useful change. Change for changes sake … is not good.

Viewed 35493 times by 3098 viewers

structure by indentation … grrrr ….

If you have to do this:

:%s/\t/    /g

in order to get a very simple function to compile because of this error

  File "./", line 13
    return sum
IndentationError: unindent does not match any outer indentation level

even though your editor (atom!!!!??!?!) wasn’t showing you these mixed tabs and spaces … Yeah, there is something profoundly wrong with the approach.

The function in question was all of 10 lines. The error was undetectable in the atom editor. Vim saves the day yet again, but … it … shouldn’t … have … to …

Viewed 36903 times by 3319 viewers

What is old, is new again

Way back in the pre-history of the internet (really DARPA-net/BITNET days), while dinosaur programming languages frolicked freely on servers with “modern” programming systems and data sets, there was a push to go from a static linking programs to a more modular dynamic linking. The thought processes were that it would save precious memory, not having many copies of libc statically linked in to binaries. It would reduce file sizes, as most of your code would be in libraries. It would encourage code reuse, which was (then) widely seen as a strong positive approach … one built applications by assembling interoperating modules with understandable APIs. You wrote glue logic to use them, and built ever better modules.

The arguments against this had to do with a mixture of API versioning (what if the function/method call changed between versions, or was somehow incompatible, or the API endpoint changed so much it went away … to security … well sort of. The argument, though not fully appreciated how powerful it was at the time, was that rogue code libraries could do nefarious things with these function calls, as there was no way to verify ahead of time, the veracity of the libraries, or the code calling them.

The latter point was prescient. And we’ve still not fully mapped out the nature of the exploits possible with LD_PRELOAD type attacks. You don’t need to hijack the source code, just change the library search path, injecting your own code ahead of the regular library code.

That is, your attack surface is now gigantic.


But for the moment, I’ll ignore that. I’ll simply focus on two aspects of static vs dynamic linking I find curious in this day and age.

First, a language growing in popularity within Google and a few other quarters is Go. Go offers, to some degree, a simplified programming model. Not nearly as much boilerplate as Java (yay!), and they make a number of things easier to deal with (multi-processing bits using channels, etc.). While it has a number of very interesting features, one of the aspects of go I find very interesting is that, by default, it appears to emit a statically linked binary … well … mostly statically linked. Very few external dependencies.

Here’s an example using code.

        # ldd minio (0x00007ffe605cc000) => /lib/x86_64-linux-gnu/ (0x00007f1227da3000) => /lib/x86_64-linux-gnu/ (0x00007f12279f8000)
	/lib64/ (0x000055e70fc40000)

This means, as long as the c, pthread, and ABI aspects don’t change, this code should be able to run on pretty much any linux kernel supporting x86-64 ABI.

Why is this important.

Very trivial application deployment.

No, really. Very trivial.

Ok, not all of this is due to Go’s propensity to be all-inclusive with a minimal footprint outside of itself. Some of this comes from thoughtful application design … you make environmental setup part and parcel of the application startup. If it doesn’t see an environment, it creates one (~/.minio). Which, with the next enhancement, makes it very easy to deploy fully functional, fully configured software appliances.

As a quick comparison, Minio’s purpose in life it to provide an S3 object store with significant AWS compatibility. And to make this drop dead simple. Compare this to a “similar” system (that I also like) in Ceph. Ceph provides many things, has a huge development group, has Red Hat behind it, and lots of vendor participation in their processes. Deployment of Ceph is anything but simple. There are a few efforts in play to try to make it simpler (see below), but nothing I’ve seen so far is either as good as Minio, or even working at this stage for that matter.

Again, to be fair (and I like/use both Ceph and Minio), Ceph is aiming for a bigger scope problem, and I expect greater complexity. Minio is a more recent development, and has exploited newer technologies to enable a faster bring-up.

The second aspect I find interesting in the static vs dynamic linking … are the way containers are built in Linux with Docker and variants. Here, instead of simply deploying a code within a zone, a partial installation of dependencies for that code is made in the container. This results in often, very large containers. There have been efforts to slim these down, using Alpine Linux, and various distros working on building minimal footprint base lines. This latter element, the minimal footprint baseline, runs a stark counter to the previous massive dependency radii that they’ve all grown to love over the last N years. In Red Hat/CentOS, you can’t install qemu/kvm without installing gluster. Even if you will never use gluster. Which means you have a whole lotta dead code in your installations you will never use in many cases.

This dead code is an attack surface. Having it there really …. really …. doesn’t help you, and means you have a much larger perimeter to defend.

But back to my point about containers … the way we are building containers in Linux is … effectively … statically linking application code with dynamic linking, to their .so/.dlls that they need to support this, and then defining the interface between the two namespaces. This is why the containers are so damned bulky. Because we are effectively statically linking the apps … after they have been dynamically linked.

Back to Minio for a moment. They provide an S3 compatible object stack, with erasure coding, replication, web management, etc. … many features of the Ceph object store … in this sized package:

        # ls -alFh minio 
        -r-x------ 1 root root 24M Feb 15 20:47 minio

Deployment is

# ./minio help
  Minio - Cloud Storage Server.

  Minio is an Amazon S3 compatible object storage server. Use it to store photos, videos, VMs, containers, log files, or any blob of data as objects.

  minio [FLAGS] COMMAND [ARGS...]

  server   Start object storage server.
  version  Print version.
  update   Check for a new software update.
  help, h  Shows a list of commands or help for one command
  --config-dir value, -C value  Path to configuration directory. (default: "/root/.minio")
  --quiet                       Disable startup information.
  --help, -h                    show help
  --version, -v                 print the version

Yeah. That’s not hard at all. Maybe there is something to this (nearly) static linking of applications.

Note by the way, there are a few Ceph-in-a-container projects running around, though honestly I am not anticipating that any of them will bear much fruit. There are also some “embedded” ceph projects, but similar comments there … I’ve played with many of them, and IMO the best way to run Ceph is as a stand-alone app with all the knobs exposed. Its complex to configure, but it works very well.

Minio targets a somewhat overlapping but different scenario, and one that is very intriguing.

Viewed 43097 times by 3821 viewers