The need to keep building and packaging as separate operations

I am working … no … struggling to “build” OFED 1.2.5.4 for our systems. OFED, for those who are not aware, is the bolus of drivers/infrastructure to support infiniband. I won’t get into the IB vs 10 GbE debate here, I see room for both technologies.

There is a catch. The people who develop OFED have made specific choices that make it, generally speaking, extraordinarily hard to debug incorrect builds of OFED, as they have integrated the build process into the packaging process using RPM. It is, as far as I am aware, simply not possible to separate the two.

Which means, the errors that I am getting, which show up in all of 3 Google pages, need to be addressed by digging into what the build script is doing, and figuring out if it really meant to be calling RPMbuild this way.

I am attempting to register for a bugzilla account at OpenFabrics, so I can report on what I find. This process has result in a hung web page.

Suffice it to say, it is my fervent hope that the OFED community goes back to building from tarballs and configure scripts in 1.3, and scrupulously avoids using RPM as its primary build mechanism. RPM is a package management system. It shouldn’t be used as a build system.

I am fairly certain even Redhat agrees with this statement.

Update: So I hacked the build.sh and build_env.sh to remove the exits for ‘errors’ such as a particular rpm not being installed. That I had to do this argues strongly in favor of the thesis I had put forth. But of course, it gets worse.

Now, deep in the bowels of one of the source RPMs, we have a changed kernel interface.

/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.4/drivers/infiniband/core/mad.c:2970: error: too many arguments to function ???kmem_cache_create???

This means, I have to unwind the RPM on another machine, fix the tarball, rewind the RPM, move it to this machine, and then try the rebuild.

Again, this is wrong. Very wrong.

[yes, I tried OFED 1.3, and ran into, you guessed it, a huge number of platform assumptions … I had to comment out almost all of the “exit 1″‘s in the install.pl, just to get it passed its (incorrect) tests. The build succeeded, but the subsequent RPM install failed … note the common issues?]

Viewed 13102 times by 3185 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail

2 thoughts on “The need to keep building and packaging as separate operations

  1. There was a “disk full” problem on the OFA server recently that may have disallowed you from getting an account. You might want to try again…? Failing that, you should definitely post to the general@lists.openfabrics.org list about your problems.

    Keep in mind that OFED *intentionally* targets specific platforms / kernels / etc. With kernel-level code, that’s the only rationale thing to do.

    That being said, I agree that debugging the RPM build process is a nightmare. It “usually” works just fine on supported platforms, but is terrible for if you want to build on a one-off system. FWIW, the overwhelming feedback from customers is that they want RPM-based installs — that’s why the OF community elected to have the OFED installer do this. Note that the individual source tarballs are available in the OFED tarball, so you *can* use configure/make to build and install (or just test) specific sub-packages if you want or need to.

    Good luck.

  2. @Jeff

    I have no problem with the intentional targeting of kernels, I agree with your assessment that this is the only rational thing to do.

    I have “fixed” the problematic RPMs. I hate to do this, but I will likely host a mercurial tree at http://hg.scalableinformatics.com for all this work, so that people can see the patches as well as the rest of it.

    The problem is that I am dealing with the 2.6.23 migration (some kmem_cache_create) and other fun things which resulted in an API change, as well as multiple packaging bugs. My major thesis is not that they shouldn’t use RPM, I am fine with RPM. My thesis is that they shouldn’t use RPM as the build environment. That and they shouldn’t be testing for specific package names being in the RPM installed list as evidence of the software being available. I don’t mind (ok a little) debugging RPMs, we deliver some of our software in RPMs on our download page (http://downloads.scalableinformatic.scom). But it has to build cleanly and correctly outside of RPM *first* before we would consider creating an RPM.

    The change that frustrated me last night was the one where, after I had my “n-th” fix done, I ran into something that apparantly Roland threw his hands in the air over. Some changes to the Chelsio stack, resulted in another API change, which wasn’t fixable by mere editing of files to adjust argument numbers. There was supposed to be a patch, and I haven’t seen it.

    That and the bug in their packaging (an RPM issue) whereby it happily wants to create /etc files in /usr/etc (also in the mailing list archives) … The way I got around that was to, while it started compiling, get into /var/tmp/OFED/usr , then rm -fr etc ; ln -s ../etc etc

    This allowed the build (which worked) to get past the packaging error (which prevented the RPM creation).

    Happily I have IB working on the 3 machines now, 2 clients and a JackRabbit server. I am doing some iSCSI testing to compare 10 GbE vs IB with a variety of software targets.

Comments are closed.