I am working … no … struggling to “build” OFED 22.214.171.124 for our systems. OFED, for those who are not aware, is the bolus of drivers/infrastructure to support infiniband. I won’t get into the IB vs 10 GbE debate here, I see room for both technologies.
There is a catch. The people who develop OFED have made specific choices that make it, generally speaking, extraordinarily hard to debug incorrect builds of OFED, as they have integrated the build process into the packaging process using RPM. It is, as far as I am aware, simply not possible to separate the two.
Which means, the errors that I am getting, which show up in all of 3 Google pages, need to be addressed by digging into what the build script is doing, and figuring out if it really meant to be calling RPMbuild this way.
I am attempting to register for a bugzilla account at OpenFabrics, so I can report on what I find. This process has result in a hung web page.
Suffice it to say, it is my fervent hope that the OFED community goes back to building from tarballs and configure scripts in 1.3, and scrupulously avoids using RPM as its primary build mechanism. RPM is a package management system. It shouldn’t be used as a build system.
I am fairly certain even Redhat agrees with this statement.
Update: So I hacked the build.sh and build_env.sh to remove the exits for ‘errors’ such as a particular rpm not being installed. That I had to do this argues strongly in favor of the thesis I had put forth. But of course, it gets worse.
Now, deep in the bowels of one of the source RPMs, we have a changed kernel interface.
/var/tmp/OFEDRPM/BUILD/ofa_kernel-126.96.36.199/drivers/infiniband/core/mad.c:2970: error: too many arguments to function ???kmem_cache_create???
This means, I have to unwind the RPM on another machine, fix the tarball, rewind the RPM, move it to this machine, and then try the rebuild.
Again, this is wrong. Very wrong.
[yes, I tried OFED 1.3, and ran into, you guessed it, a huge number of platform assumptions … I had to comment out almost all of the “exit 1″‘s in the install.pl, just to get it passed its (incorrect) tests. The build succeeded, but the subsequent RPM install failed … note the common issues?]