Half open drivers … OFED stacks with verbs ABIs that don't match the kernel's verb ABI …

I just ran through another update exercise. IB cards, OFED stack. GlusterFS atop this. Cards are well known vendors cards. They work pretty well.
But …
only with very specific kernels. Other kernels need not apply.
Our kernel is pretty darned fast (so our customers tell us). Now lets build the OFED 1.5.2 … and see what happens …

To make a long story short, we wound up abandoning that approach. While we were faster in all aspects, the OFED stack wound up … somehow … having a verbs ABI that was mismatched to the kernel verbs ABI. Which meant … ib_send_bw and other things … like any verbs dependent app (like, I dunno, GlusterFS mebbe?) didn’t work.
I guess I am confused … how could the OFED stack correctly compile if the ABIs are different? Unless they were doing something funky?
Next time you build OFED by hand using install.pl, just for laughs, use it with the -vvv option. You’ll see some nice serious “funkyness”. Or look in the install.pl and you can see the “business logic” encoding to decide what does and does not get built.
This is annoying. Maybe this is why Redhat doesn’t use OFED directly, but uses the packages on their own. The drive source that isn’t in the kernel … the half open drivers I talked about in the past … these are a problem. The ABI mismatches … and building against the wrong ABI … yeah, thats a problem too.
Wasted nearly a day on a siCluster trying to work around these issues. Finally came up with a solution that works. Not especially happy with it, but hey, it does work.

3 thoughts on “Half open drivers … OFED stacks with verbs ABIs that don't match the kernel's verb ABI …”

  1. In principal we should complain hard to the linux-infiniband list. Once a userspace ABI in the the kernel, it must not be changed anymore. Which means whichever current userspace lib is out there available – it has to work with the in kernel IB stack. And if it works with in kernel-IB, I cannot see why it should work with external IB…
    Now that is just theory and I run into such issus myself already. But that is why we should complain.
    I can understand when ABIs slightly change after years, but not from version to version.

  2. Just be aware that if you ARE using Redhat, they are still shipping a broken OpenSM. 3.3.3 has a bug for users of IB attached storage or iscsi hosts needing SRP.

    • @Ed
      (congrats on the move to OSC btw).
      Yeah, IB drivers are causing me grief. I am thinking about doing our own build, so that the kernel ABI matches the user space ABI. The QIB driver issue in OFED means we are stuck using ancient kernels with modern user space IB, but we can’t use bugfixed modern kernels with modern user space, because the drivers in the modern kernels are ancient and the drivers in the ancient kernels are the modern ones.
      In a word, ugh.
      I have to ask whether IB vendors actually want to make a stronger case than 10GbE vendors? I am not convinced of this. The world doesn’t only use Redhat kernels or user space.

Comments are closed.