Udev is a /dev population tool. Enables devices to be hotplugged, and it adapts the system to the changes by running commands and scripts.
Udev runs upon reboot. And in the background courtesy of libevent, it handles changes as they occur.
Except, every now and then, something goes arwy with UDev. Like it hangs.
So booting stops. Cold. With no way around it.
Sort of our BSOD. Just as inconvenient.
What you can do about it is fairly interesting.
1) You can turn udev off. This probably makes sense for most systems where you really won’t be hotplugging, and the system won’t experience major changes between boots.
2) you can tell udev to time out using udevtimeout=seconds on the boot line. Of course, your device drivers may not completely initialize properly.
3) you can google and discover that udev hangs are invariably device drivers initializing badly, but … you have no way to tell which device drivers. Nice … huh? You can edit /etc/udev/udev.conf and tell it to be very verbose, and it spits lots of things out.
What occurs to me is that since udev processes everything serially, when it hangs, everything behind the point of hanging is also toast.
How is this a good thing?
If we can’t easily figure out what is broken, we don’t stand much of a chance of fixing it.
I’d probably recommend udevtimeout from now on, just to give the users a fighting chance of using their hardware, and doing diagnostics, in the face of buggy drivers.
All drivers are buggy. All hardware is buggy. We just need reasonable responses from systems to these bugs. Udev needs to be fixed so it doesn’t result in non-functional systems … as it does now.
FWIW, this is after using the Mellanox OFED stack tools, which are very very picky about the kernel. After installation, they appear to do some damage to the udev rules. Not sure I can easily fix this damage. I will try. But a broken rule means a broken system.