Interesting articles on systemd and ZFS

The systemd article is on LWN, and discusses the “tragedy” of it. The ZFS post was linked from HackerNews and discusses risk to ZFS’s future from the perspective of FreeBSD leveraging ZFS on Linux as its upstream.

Ok, first onto systemd. For those who don’t know systemd, think of it as the borg that ate init. And upstart. And … Basically, it is a replacement infrastructure for running services on Linux. It is more than that though, as it seeks to replace a number of critical things. And it often gets some simple things (badly) wrong.

But overall, it is a step in the right direction. Which I have to grudgingly admit, after working with it for a while. I don’t want it to be my DNS system, or any number of other things. This is more of a default configuration issue, that from what I’ve seen, most distros get wrong for anything other than the trivial desktop case.

The changes are manageable. And adding new service units is relatively easy. What’s hard, sometimes, is figuring out where to insert your unit, and which units to set up as dependencies and which ones to remove. The units represent a DAG, and if you mess up dependencies, you get the dreaded startup cycle timeouts. Or worse, on shutdown, it … doesn’t.

All that is fixable. What I don’t want systemd doing is taking over services with systemd-dns or any of the other services it wants me to consume. So careful pruning is required.

The article is interesting as it discusses the service layer as the inevitable evolution of the preceding mass of scripts, daemons, and other things that didn’t really have an API or collective control plane. The aspect of tragedy arises in connection with how these changes are perceived by the community.

The whole systemd battle, Rice said, comes down to a lot of disruptive change; that is where the tragedy comes in. Nerds have a complicated relationship to change; it’s awesome when we are the ones creating the change, but it’s untrustworthy when it comes from outside. Systemd represents that sort of externally imposed change that people find threatening. That is true even when the change isn’t coming from developers like Poettering, who has shown little sympathy toward the people who have to deal with this change that has been imposed on them. That leads to knee-jerk reactions, but people need to step back and think about what they are doing.

I will freely admit I was quite skeptical about this massive change when it occurred. To so many leading distributions, all around the same time. But, it has made things generally better, even while its stepped on many toes.

Systemd is tightly tied to linux though, and isn’t easily portable. In his words

Systemd makes no attempt to be portable to non-Linux systems, which leads to a separate class of complaints. If systemd becomes the standard, there is a risk that non-Linux operating systems will find themselves increasingly isolated. Many people would prefer that systemd stuck to interfaces that were portable across Unix systems, but Rice had a simple response for them: “Unix is dead”. Once upon a time, Unix was an exercise in extreme portability that saw some real success. But now the world is “Linux and some rounding errors” (something that, as a FreeBSD person, he finds a little painful to say), and it makes no sense to stick to classic Unix interfaces. The current situation is “a pathological monoculture”, and Linux can dictate the terms that the rest of the world must live by.

I’ve heard colleagues talk about the pathological monoculture in the past. I’m not a fan of that language, but his points are, if nothing else, brutally honest. So I wouldn’t expect to see systemd on FreeBSD any time soon. And it seems that there is, er … less of an appetite for something like that anyway.

The second article, is on implications around FreeBSD ZFS being based upon ZoL versus illumos going forward. Specifically the author discusses potential risks to downstream projects due to risks in ZoL itself.

As a quick clarification, FreeBSD uses OpenZFS. It has been using illumos as its upstream. This should be fairly reasonable from the perspective of code porting and licensing. But as Matt Macy pointed out

In the past few years the vast majority of new development in ZFS has taken place in DelphixOS and zfsonlinux (ZoL). Earlier this year Delphix announced that they will be moving to ZoL https://www.delphix.com/blog/kickoff-future-eko-2018 This shift means that there will be little to no net new development of Illumos. While working through the git history of ZoL I have also discovered that many races and locking bugs have been fixed in ZoL and never made it back to Illumos and thus FreeBSD.

So he’s pointing out that the center of mass for this work has moved to ZoL. This could be viewed as a positive, if the licenses allowed for inclusion into the linux kernel.

Remember, license incompatibilities are problematic with GPL based Linux and non-GPL based other code. In the case of ZFS, it is CDDL licensed which is GPL incompatible. Curiously, Oracle provides ZFS with its OEL system, though I don’t know if it is based upon OpenZFS, or its own code base from Solaris 11+.

One might think that this is a minor concern, as it is fine to keep developing ZoL out of kernel.

Except, in the 5.0-rc releases, a set of functions were removed as there are no in-kernel users of them anymore. Pruning functions may be a great idea, but in the process, it broke the SIMD enabled crc calculation that ZoL was using.

So while the article talked about the risk to FreeBSD’s use of OpenZFS in the sense of ZoL representing a risk, commenters exposed that this isn’t a theoretical risk, which the original poster implied. It’s a real risk. In the sense that if ZoL is forced to code around breakage due to licensing, which, they are, then those workarounds are going to be in the code base. Which ZoF will be using.

The linux kernel team expressed it thusly

My tolerance for ZFS is pretty non-existant. Sun explicitly did not want their code to work on Linux, so why would we do extra work to get their code to work properly?

I am not sure how to solve this issue. I don’t know if Oracle would multiple license its ZFS bits. Or if it is happy the way things are now. This is not a company known to be friendly to giving away functionality it can charge for.

I don’t see the linux kernel team changing its mind any time soon. So the risk to ZoL is real, and accumulates in ZoF. Basically this isn’t pretty. ZoL doesn’t have corporate sponsors, its a project of LANL. OpenZFS has many users, and few developers. And one of its major developers, as pointed out, switched to ZoL.

The primary driver for ZoL is Lustre. Lustre isn’t the only parallel filesystem in “market”. There are many others: BeeGFS, Weka.IO, OrangeFS, LizardFS, CephFS, and (if you squint hard) pNFS. There are storage appliance vendors like Panasas, Vexata, E8, DDN, Cray, IBM, DellEMC, and so on. Lustre is important to DDN and Cray, less so to DellEMC.

You can run Lustre atop non ZoL systems, though most everyone uses ZoL by now. Many moons ago (pre 2.x) I ran it on ext4. I think I saw xfs used by the SuSE folks in the 1.6-ish days.

Basically what I am noting is that Lustre is one of several parallel file systems out there, and ZoL is important to it. BeeGFS runs atop Zol if you like, but doesn’t need to. So the drive for ZoL is primarily Lustre. Which isn’t as popular as it once was, and has significant competition.

Moreover, ZoL is effectively “funded” by LANL (developer salaries). What if this were to change?

All of these things need to factor into the risk equation. This is where the monoculture comment becomes important.

Call this a cautionary sense of foreboding. I am not personally dependent upon ZoL working right now. Maybe eventually. But I also don’t like to see useful tools rendered less useful for, what amounts to now, historical spite.

Greg Kroah-Hartmann’s comments about Sun are correct on the LKML. But, there is no such company as Sun anymore. The people who were there, are scattered (and yes, I work with a number of them, and they are terrific engineers, though like GK-H, highly opinionated).

The question is whether or not Oracle would support multiple licensing of ZFS to enable better inclusion in the linux kernel. So we don’t have to accept the increased risk of the linux kernel eventually being completely inhospitable to ZoL. Which would increase risk to ZoF. And everything else.