The joys of automated tooling … or … catching changes in upstream projects workflows by errors in yours

We have an automated build process for our boot images. It is actually quite good, allowing us to easily integrate many different capabilities with it. These capabilities are usually encapsulated in various software stacks that provide specific functionality.
Most of these stacks follow pretty well defined workflows. For a number of reasons, we find building from source generally easier than package installation, as there are often some, well, effectively random (and often poor) choices in build options/file placement in the package builds. Not to mention often insane package dependence radii, which requires you bring in a positively massive number of mostly useless packages in order to get a specific capability (see the linux kernel perf function man page build for example … it needs LaTeX … yes, really, via asciidoc and other tools that also need to pull in fonts, and X and … /sigh ). In this case, the issue is in the build environment (it is badly borked) in the kernel for that part. I’d love to attack it and fix it, but alas, I simply don’t have the time.
Well, for the most part, the stacks we need are well defined in terms of their build environments. You only get small changes with minor release version changes.
Not so fast here.
Two of the packages we use have pre-built binaries, and source build options. In one case, the pre-built binaries include the approximate base distro, but the wrong kernel version. Not a problem in most cases, as the build environment has the right kernel version in the build target.
The problem is that the updated version of the build script changed over the last 2 or 3 months, and now it no longer (correctly) respects the kernel version information. It likes a hardwired ‘uname -r’ versus the kernel version option that it happily accepts on the command line. And fails to do the right thing.
So for this, we need a two stage scenario, where we build a version without this stack, boot a machine into this, and then rebuild with this stack.
I am planning on fixing this when I get time (the fix mechanism they indicate is actually broken in the same way as it turns out). Sadly they like to use DKIM mechanism, which make no sense for a hardwired build image. Happily we can work around that. Sadly the bugs are in their workaround.
In the second stack causing me grief, they changed a core aspect of their packaged build in recent days, as we had been using the official packages to make life faster for a group of customers using this code. Unfortunately, the way they broke the packaged build is also reflected in other aspects as well, and I am still wrapping my head around all the breakage.
This all goes back to why we have to build critical infrastructure elements ourselves. It is not because we like doing it. We don’t. The problem is that with distro specific builds, you tie a number of capabilities to the distro, and you have those limits as well. Which, if you are building high performance hyperconverged hardware appliances, gets to be something of an issue. Rather quickly.
Definitely not what I want to be fixing on a Sunday afternoon, but hey, ya gotta do what ya gotta do.