What is old, is new again

By joe

March 2, 2017 - 6 minutes read - 1127 words

Way back in the pre-history of the internet (really DARPA-net/BITNET days), while dinosaur programming languages frolicked freely on servers with “modern” programming systems and data sets, there was a push to go from a static linking programs to a more modular dynamic linking. The thought processes were that it would save precious memory, not having many copies of libc statically linked in to binaries. It would reduce file sizes, as most of your code would be in libraries. It would encourage code reuse, which was (then) widely seen as a strong positive approach … one built applications by assembling interoperating modules with understandable APIs. You wrote glue logic to use them, and built ever better modules. The arguments against this had to do with a mixture of API versioning (what if the function/method call changed between versions, or was somehow incompatible, or the API endpoint changed so much it went away … to security … well sort of. The argument, though not fully appreciated how powerful it was at the time, was that rogue code libraries could do nefarious things with these function calls, as there was no way to verify ahead of time, the veracity of the libraries, or the code calling them. The latter point was prescient. And we’ve still not fully mapped out the nature of the exploits possible with LD_PRELOAD type attacks. You don’t need to hijack the source code, just change the library search path, injecting your own code ahead of the regular library code. That is, your attack surface is now gigantic. Ok. But for the moment, I’ll ignore that. I’ll simply focus on two aspects of static vs dynamic linking I find curious in this day and age. First, a language growing in popularity within Google and a few other quarters is Go. Go offers, to some degree, a simplified programming model. Not nearly as much boilerplate as Java (yay!), and they make a number of things easier to deal with (multi-processing bits using channels, etc.). While it has a number of very interesting features, one of the aspects of go I find very interesting is that, by default, it appears to emit a statically linked binary … well … mostly statically linked. Very few external dependencies. Here’s an example using minio.io code.

        # ldd minio
    linux-vdso.so.1 (0x00007ffe605cc000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1227da3000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f12279f8000)
    /lib64/ld-linux-x86-64.so.2 (0x000055e70fc40000)

This means, as long as the c, pthread, and ABI aspects don’t change, this code should be able to run on pretty much any linux kernel supporting x86-64 ABI. Why is this important. Very trivial application deployment. No, really. Very trivial. Ok, not all of this is due to Go’s propensity to be all-inclusive with a minimal footprint outside of itself. Some of this comes from thoughtful application design … you make environmental setup part and parcel of the application startup. If it doesn’t see an environment, it creates one (~/.minio). Which, with the next enhancement, makes it very easy to deploy fully functional, fully configured software appliances. As a quick comparison, Minio’s purpose in life it to provide an S3 object store with significant AWS compatibility. And to make this drop dead simple. Compare this to a “similar” system (that I also like) in Ceph. Ceph provides many things, has a huge development group, has Red Hat behind it, and lots of vendor participation in their processes. Deployment of Ceph is anything but simple. There are a few efforts in play to try to make it simpler (see below), but nothing I’ve seen so far is either as good as Minio, or even working at this stage for that matter. Again, to be fair (and I like/use both Ceph and Minio), Ceph is aiming for a bigger scope problem, and I expect greater complexity. Minio is a more recent development, and has exploited newer technologies to enable a faster bring-up. The second aspect I find interesting in the static vs dynamic linking … are the way containers are built in Linux with Docker and variants. Here, instead of simply deploying a code within a zone, a partial installation of dependencies for that code is made in the container. This results in often, very large containers. There have been efforts to slim these down, using Alpine Linux, and various distros working on building minimal footprint base lines. This latter element, the minimal footprint baseline, runs a stark counter to the previous massive dependency radii that they’ve all grown to love over the last N years. In Red Hat/CentOS, you can’t install qemu/kvm without installing gluster. Even if you will never use gluster. Which means you have a whole lotta dead code in your installations you will never use in many cases. This dead code is an attack surface. Having it there really …. really …. doesn’t help you, and means you have a much larger perimeter to defend. But back to my point about containers … the way we are building containers in Linux is … effectively … statically linking application code with dynamic linking, to their .so/.dlls that they need to support this, and then defining the interface between the two namespaces. This is why the containers are so damned bulky. Because we are effectively statically linking the apps … after they have been dynamically linked. Back to Minio for a moment. They provide an S3 compatible object stack, with erasure coding, replication, web management, etc. … many features of the Ceph object store … in this sized package:

        # ls -alFh minio
        -r-x------ 1 root root 24M Feb 15 20:47 minio

Deployment is

# ./minio help
NAME:
  Minio - Cloud Storage Server.
DESCRIPTION:
  Minio is an Amazon S3 compatible object storage server. Use it to store photos, videos, VMs, containers, log files, or any blob of data as objects.
USAGE:
  minio [FLAGS] COMMAND [ARGS...]
COMMANDS:
  server   Start object storage server.
  version  Print version.
  update   Check for a new software update.
  help, h  Shows a list of commands or help for one command
FLAGS:
  --config-dir value, -C value  Path to configuration directory. (default: "/root/.minio")
  --quiet                       Disable startup information.
  --help, -h                    show help
  --version, -v                 print the version
VERSION:
  2017-02-16T01:47:30Z

Yeah. That’s not hard at all. Maybe there is something to this (nearly) static linking of applications. Note by the way, there are a few Ceph-in-a-container projects running around, though honestly I am not anticipating that any of them will bear much fruit. There are also some “embedded” ceph projects, but similar comments there … I’ve played with many of them, and IMO the best way to run Ceph is as a stand-alone app with all the knobs exposed. Its complex to configure, but it works very well. Minio targets a somewhat overlapping but different scenario, and one that is very intriguing.