Distribution package dependency radii, or why distros may be doomed

I am a sucker for a good editor. I like atom. Don’t yell at me. Its pretty good for my use cases. It has lots of nice extensions I can and have used.

Atom is not without its dependencies though. Installing it, which should be relatively simple, turns out to be … well … interesting.

[root@centos7build nyble]# rpm -ivh ~/atom.x86_64.rpm 
error: Failed dependencies:
	libXss.so.1()(64bit) is needed by atom-1.26.0-0.1.x86_64

In searching the interwebs for what Xss is, I happened across this little tidbit

Experiencing this on CentOS 7 as well. Running yum install libXScrnSaver works there as a workaround.

While you might think, hey great, he found something that worked, let me briefly describe this machine to you, so we can decide if it is really “great”.

This is a HVM, running KVM, with no display. It is a build machine I set up for building my NyBLE stack. Its running locally on one of my 10 year old boxen. I wanted to put atom on there, as it is on my gigabit lan, and I don’t mind remote X here at home. The editor performs nicely from my laptop.

That is, in order to use a text editor, I had to install a screensaver on a headless VM.

Ok, this isn’t quite the worst of it. While I was researching what the minimal installation of CentOS I could get away with and have something fully functional for my purposes, I started looking into the package groupings. Take the “Minimal Install” grouping, which I would assume, would install the bare minimum I need to get an operational system.

[root@centos7build ~]# yum group info "Minimal Install"
...
Environment Group: Minimal Install
 Environment-Id: minimal
 Description: Basic functionality.
 Mandatory Groups:
   +core
 Optional Groups:
   +debugging

Ok, it is build upon the core group. Lets look at that.

[root@centos7build ~]# yum group info "core"
Loaded plugins: fastestmirror
...
Group: Core
 Group-Id: core
 Description: Smallest possible installation.
 Mandatory Packages:
    audit
    basesystem
    bash
   +btrfs-progs
    coreutils
    cronie
    curl
    dhclient
    e2fsprogs
    filesystem
   +firewalld
    glibc
    hostname
    initscripts
    iproute
    iprutils
    iptables
    iputils
    irqbalance
    kbd
    kexec-tools
    less
    man-db
    ncurses
    openssh-clients
    openssh-server
    parted
    passwd
    plymouth
    policycoreutils
    procps-ng
    rootfiles
    rpm
    rsyslog
    selinux-policy-targeted
    setup
    shadow-utils
    sudo
    systemd
    tar
   +tuned
    util-linux
    vim-minimal
    xfsprogs
    yum
 Default Packages:
   +NetworkManager
   +NetworkManager-team
   +NetworkManager-tui
   +NetworkManager-wifi
   +aic94xx-firmware
   +alsa-firmware
    biosdevname
    dracut-config-rescue
   +ivtv-firmware
   +iwl100-firmware
   +iwl1000-firmware
   +iwl105-firmware
   +iwl135-firmware
   +iwl2000-firmware
   +iwl2030-firmware
   +iwl3160-firmware
   +iwl3945-firmware
   +iwl4965-firmware
   +iwl5000-firmware
   +iwl5150-firmware
   +iwl6000-firmware
   +iwl6000g2a-firmware
   +iwl6000g2b-firmware
   +iwl6050-firmware
   +iwl7260-firmware
   +iwl7265-firmware
    kernel-tools
    libsysfs
    linux-firmware
    microcode_ctl
   +postfix
 Optional Packages:
   dracut-config-generic
   dracut-fips
   dracut-fips-aesni
   dracut-network
   initial-setup
   openssh-keycat
   rdma-core
   selinux-policy-mls
   tboot

Look at all that nice firmware I don’t need/want. As part of core. Which is part of “Minimal Install”.

And NetworkManager. And postfix.

The problem is, when you install other packages, they will pull often large additional swaths of dependencies in to provide a library or a function which isn’t really used or needed.

Like the libXss with atom.

What I want, as a user, is to be able to build/install a minimal sized effectively appliance OS. I don’t want/need massive dependency radii. In fact, such a thing is an anti-pattern. It is the opposite of what I need.

Assume I am building minimal sized installations as appliances I can deploy quickly and easily. Do I really need LaTeX in, because of an obscure tertiary or worse dependency graph?

No.

And, to make matters worse, if you try to excise the unneeded packages, you invariably risk the integrity of the package set you are attempting to install. As that dependency graph is unforgiving.

If I were a conspiracy theorist (I am not), I might follow a flight of fancy over distro lockin with extended dependency radii.

I am, or like to think of myself as, pragmatic, which means I try not to let things like this get in the way of doing what I need to get done. And please understand, these dependency radii are inhibitors, barriers to be overcome in many cases.

They aren’t helpful.

I don’t see distros adding much value beyond bug databases and some packaging/tooling. When the latter gets in the way of the former, I’ll do what I can to contain the damage.

I know some people are as wedded to their distros as others are to OSes. Distros, OSes, etc. are details of an instance. Not the purpose of the instance. They are tools to be used, changed when they don’t work for something that does work. No one tool is perfect, no one distro, OS, kernel, toolchain is beyond question.

This is one of the reasons I question the longevity of distros that attempt to enforce their world view on wide package radii. There is a long tail of users for everything, and the ardent defenders of a particular way of thinking will defend it to the last developer. Even as the world moves on and leaves them behind.

Lets encourage the distro people to rethink what it is they deliver. So minimal is really minimal. There is far more value in that, than in trying to tie everything through package management and package dependency graphs that thwart peoples attempts to reduce footprint and increase efficiency.

Viewed 39567 times by 5906 viewers

NyBLE

So there I am updating my new repository to enable using zfs in the ramboot images. This is a simplification and continuation of the previous work I did a few years ago, with some massive code cleanups. And sadly, no documentation yet. Will fix soon, but for now, I am trying to hit the major functionality points.

NyBLE is a linux environment for hypervisor hosts. It builds on the old open source SIOS work, and extends it in significant ways. Basically, we want to be able to PXE boot (or locally boot if you prefer, but managing OS or boot drives is something you never want to consider at scale) a full working OS into a ramdisk.

I have this working nicely for Debian 9, 8 and others, CentOS 7 (and RHEL 7 by extension). Eventually I’ll add others.

The purpose of NyBLE is to build what we want to use, not what the distro or image builder wants us to use. It is not a linux-from-scratch. It starts with a base distribution, so as to make questions about “but its X and we like Y, so why are you forcing us to change?” largely irrelevant.

OSes are, generally, a detail of an implementation. Distros are packagings of OSes in particular ways, that in theory, should impart some value to the users and consumers.

In the past, this may have been quite true. Currently, the only real values distros provide are centralized bug databases, security patches, and approximately working configurations.

Distros carry significant baggage. Ancient toolchains, often EOLed for years are being shipped with “modern” distros. Just remember they are more “stable”. Or something.

Distros often have unfortunate dependency radii. You can’t simply install package X, as it also depends on some functionality on Y and Z, which themselves depend upon … . Which means, building small footprint installations with all the functionality you need, say, ideal for memory booting, becomes fraught with peril of unmet dependencies.

While this is all … well … a fun challenge, think of it as fitting together puzzle pieces, where you only approximately know the shape of the pieces and the picture they display. And of course, each attempt iteration at fitting things together, could take minutes to hours …

… now add to this a temporal dimension. Suppose you want to compile support for a subsystem in the kernel. This support requires that the kernel-devel tree exist for a number of reasons. And suppose the kernel-devel tree is very large. Which makes PXE booting impossible.

So … now what you need to do is to temporarily load the tree, build and install the modules, and then remove the tree.

Sort of like cooking, where you need essence of kernel. But not the thing providing the essence as it would make the image too large.

This is the joy I am working through now. Works great BTW, in debian. Not so well in CentOS. Huge tree size.

The things we do to make end users lives easier and friction free …

Viewed 41455 times by 6203 viewers

Dealing with disappointment

In the last few years, I’ve had major disappointments professionally. The collapse of Scalable, some of the positively ridiculous things associated with the aftermath of that, none of which I’ve written about until they are over. Almost over, but not quite. Waiting for confirmation. My job search last year, and some of the disappointment associated with that.

Recently I’ve had different type of disappointments, without getting into details. The way I’ve dealt with these things in the past has been to try to understand if there was a conflict, what could I have done better. Regardless of whether I was in the right or the wrong, to try to look at things from a different viewpoint.

Dealing with the collapse of Scalable has been a challenge, personally and professionally. I can’t get into all the reasons why right now, as we are awaiting conclusions on a few things. Dealing with what came next, from the groups that thought they would cherry pick the remains, and tie me into their organizations on the cheap, was tremendously frustrating.

Value for me would have been help with relief of the personal monetary burden imposed. Not indentured servitude, which what was effectively proposed. Doing due diligence on some of the folks who reached out to me suggested that this would be their behavior. I naively hoped against hope that this time would be different, that their past behavior would not be reflected in dealings.

Yeah, I’ve got lots to write about there someday. This I am dealing with, and I’ll eventually write about it.

How to deal with disappointment … where there is conflict, I try to work with people to understand the nature of the conflict. I’ve found in the past that keyboards and asynchronous messaging systems tend to amplify contact, but personal contact, phone calls, enable you to more accurately express thoughts and listen.

As long as I am conversing with a person who wants to get to a point of resolution or even agree to disagree, I can manage. What I find disappointing is, when this is not the case. When the people involved seem more focused upon the argument, or a narrative that doesn’t mesh with reality. I take people to task for stuff like that. On my teams, on other teams.

As Mark Twain may have once written

A lie can travel halfway around the world before the truth can get its boots on

Information asymmetry is a tool of those who wish to manipulate. If you inject falsehoods into discussions, you can get your opponents to waste their energy countering a massive flow of accusations. This is basically how political agitprop channels work here in the US.

It is sometimes called “throw stuff at the wall and see what sticks” but in parallel, with many throwers. Usually done to protect ideologies. The instigators of such know they are on shaky ground. Many are not thinking big picture, and “be careful what you wish for, you just might get it.”

I try to turn conflict into a way to improve myself. This way, disappointment that a conflict exists become a mechanism for self reflection.

But when that conflict is being seeded with nefarious motives, and people I otherwise genuinely respect and like are actively engaged in it … yeah, not so much I can do there other than to disengage, let their fire cool.

Later on, when they are ready to re-engage, I can decide if I really should or not, as past unprofessional behavior may indicate significant issues with potential future unprofessional behavior. I am too old for drama, and intrigue. I don’t have time to waste on stupid arguments, and ridiculous narratives.

Attack the premise. Not the person. Reflect deeply before attacking the person. Accusations have a way of metastitising. Asynchronous message systems have a positive feedback loop on these, while phone calls have a negative feedback loop.

Someone offers to call you to try to understand your issues, take em up on it. Chances are they’ve dealt with bad stuff before and know that this is the better route to take.

Viewed 45353 times by 6618 viewers

Late Feb 2018 update

Again, many apologies over the low posting frequency. Several things that are nearing completion (hopefully soon) that I want finalized first.

That said, the major news is that this site is now on a much improved server and network. I’ve switched from Comcast Business to WOW business. So far, much better speed, more consistent performance, far lower cost per bandwidth.

I do have lots to write about, and have been saving things up until after this particular objective is met, so I can work/write distraction free.

More soon.

Viewed 114530 times by 14093 viewers

Apologies on the slow posting rate

Many things are going on simultaneously right now, and I have little time to compose thoughts for the blog. I anticipate a bit of a letup in the next week or two as the year comes to a close.

Viewed 211511 times by 23056 viewers

Cool bug on upgrade (not)

WordPress is an interesting beast. Spent hours working through issues that I shouldn’t have needed to on an upgrade, as some functions were deprecated.

In an interesting way. By removing them, and throwing an error. Which I found only through looking at a specific log.

So out goes that plugin. And the site is back.

Viewed 248388 times by 26035 viewers

#SC17

I’ve had numerous requests from friends and colleagues about whether I will be attending #SC17 this year. Sadly, this is not to be the case. $dayjob has me attending an onsite meeting that week in San Francisco, and the schedule was such that I could not attend the talks I was interested in.

I’d love for there to be a way to listen to the talks remotely. Maybe I’ll simply buy the DVD/USB stick of the talks if there is an online store for them.

Next year at #SC18 in Dallas if possible.

Enjoy, have fun @BeowulfBash, and please tweet/post what you see and hear.

And, for those whom are not aware of some of the most awesome hardware out there for big data analytics and deep learning, have a look at @thedeadline and Basement Supercomputing. Best in market, designed and built by people who know how to use the machines, what they are used for and why.

(unpaid/uncompensated endorsement … get out and support the small HPC guys, the ones who actually know what they are doing).

Viewed 238974 times by 24714 viewers

Disk, SSD, NVMe preparation tools cleaned up and on GitHub

These are a collection of (MIT licensed) tools I’ve been working on for years to automate some of the major functionality one needs when setting up/using new machines with lots of disks/SSD/NVMe.

The repo is here: https://github.com/joelandman/disk_test_setup . I will be adding some sas secure erase and formatting tools into this.

These tools wrap other lower level tools, and handle the process of automating common tasks you worry about when you are setting up and testing a machine with many drives. Usage instructions are in the code at the top … I will eventually add better documenation.

Here are the current list of tools (note: they aren’t aware of LVM yet, let me know if you would like this):

  • disk_mkfsxfs.pl : Takes every disk or ssd that is not mounted or part of an MD RAID, and creates a file system on the disk, a mount point under a path of your choosing (/data by default). This is used with the disk_fio.pl code.
  • disk_fio.pl : generate simple disk fio test cases, in which you can probe the IO performance to file system for many devices. The file names it generates include case type (read, write, randread, randwrite, randrw), blocksize and number of simultaneous IOs to each LUN. This will leverage the mount point you provide (defaults to /data) and will create test directories below this so you don’t get collisions. To run the test cases, you need fio installed.
  • disk_wipefs.pl : removes any trace of file system metadata on a set of drives if they are not mounted and not part of an existing MDRAID.
  • ssd_condition.pl : runs a conditioning write on a set of SSDs if they are not mounted or part of an MDRAID. If you are setting up an SSD based machine, you are of course, running something akin to this before using the SSDs … right? Otherwise, you’ll get a nasty performance and latency shock after you transition from the mostly unallocated block scenario to the completely allocated scenario. Especially painful when the block compression/garbage collection passes come through to help the FTL find more space to write your blocks. You can tell you are there if you see IO pauses after long sequences of writes. Also, condition helps improve the overall life of the drive. See this presentation around slide 7 and beyond for more info.
  • sata_secure_erase.pl : which will completely wipe the SSD (will work with rotational media as well).

The user interface for these tools are admittedly spartan and documented at the top of the code itself. This will improve over time. Have at them, MIT licensed, and please let me know if you use them or find them useful.

Viewed 300371 times by 28394 viewers

Aria2c for the win!

I’ve not heard of aria2c before today. Sort of a super wget as far as I could tell. Does parallel transfers to reduce data motion time, if possible.

So I pulled it down, built it. I have some large data sets to move. And a nice storage area for them.

Ok.

Fire it up to pull down a 2GB file.

Much faster than wget on the same system over the same network. Wow.

Then the rest of the ML data set. About 120GB in all.

Yeah, this is a good tool. Need to make sure it is on all our platforms.

Sort of like gridftp but far more flexible.

Definitely a good tool.

Viewed 269773 times by 26892 viewers