On the broken-ness of most Linux distributions …

If you have anything approaching a complex installation or management requirement for your systems, most … no … pretty much all Linux distributions have anywhere between somewhat borked to completely boneheaded designs for handling these complex sitatuations.

Say, for example, you want to boot a diskless NFS system, and replicate it. Diskless NFS is well known to be an easy to manage scenario … one system to manage, very scalable from an admin point of view.

But pretty much impossible for most distributions, out of the box, without some seriously major hacking. And its not because of them omitting support for this … no … it has to do with their designs. They are broken, in a fundamental manner, when it comes to handling anything of complexity.

Try an iSCSI installation. Set up a system so that it installs to an iSCSI target, and then correctly boots from it.

Or try a system where you need to correctly assemble a software RAID a-priori. We have some tools and tweaks again mkinitrd in Centos, but largely, the boot design of these distros work actively against you in dealing with anything remotely coming close to a complex system.

The whole mkinitrd/mkramfs system is all about carrying along modules, scripts, extra config bits, that you would ordinarily pull from disk or a remote machine. But the modular kernel builds, or the scripts associated with figuring out what to do, actually actively get in the way of you keeping the initrd/initramfs correctly in sync. Which leads to (often hours of) long debugging times to try to figure out what went bang after an upgrade.

Compare this to something like system rescue cd. Which upon boot, assembles raids, and does everything else we need. Without doing much in the way of an error prone switch-root or root-pivot. Those are fairly dangerous parts of the boot process in many of the major distros. as this is what inevitably fails in the complex environment scenario.

What you really want is the OS, kernel to fully boot, correctly, and then assemble raids and other bits (iSCSI, NFS, etc). system rescue CD seems to not have a problem doing this. Centos/RHEL? Not quite so good at it.

So I am thinking we might need to replace the init bits with something far more intelligent, and able to handle complexity. The current system really doesn’t work well at all, apart from a limited range of use case.

Yeah, dealing with some real joyful stuff right now, and the RHEL/Centos startup system isn’t doing a terribly good job. Some of it could be the kernel, which is … a Sun built thing … for Lustre. This could explain it … and one of the reasons why we build our own kernels is we want things to work … right … with no messing around.

But … its frustrating, and very wasteful of my time, to have to go back and fix this stuff again and again.

Viewed 12497 times by 3150 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail