IPMI is (sometimes) a wonderful thing. It can help you figure out problems, provide a console over network capability, as well as power cycle machines.
This is of course, when it works.
When it doesn’t, it is a nightmare.
We have a cluster in place with a mostly functional IPMI stack. Customer indicated a problem with a node, and we brought it back to the lab. Turns out that during a recent move of theirs, they damaged a port on it. Ok, use the other network, no problem.
In the lab, booted into a 126.96.36.199 diskless SuSE kernel (with all sorts of goodness in this kernel) to check it out, I am looking to make sure the IPMI is working. So I run ipmitool locally
ipmitool -I open sdr
which basically means, spit back your sensors to me, please. When it works, it looks something like this:
root@dragonfly:~# ipmitool -I open sdr
ambienttemp | 28.20 degrees C | ok
bulk.v12-0-s0 | 12.06 Volts | ok
bulk.v3_3-s0 | 3.36 Volts | ok
bulk.v3_3-s5 | 3.24 Volts | ok
bulk.v5-s0 | 5.04 Volts | ok
bulk.v5-s5 | 5.04 Volts | ok
cpu0.dietemp | 44.40 degrees C | ok
cpu0.memtemp | 30 degrees C | ok
cpu0.vcore-s0 | 1.44 Volts | ok
cpu0.vldt2 | 1.20 Volts | ok
cpu1.dietemp | 45.60 degrees C | ok
cpu1.memtemp | 31.20 degrees C | ok
cpu1.vcore-s0 | 1.44 Volts | ok
fan1.tach | 9960 RPM | ok
fan2.tach | 10080 RPM | ok
fan3.tach | 10620 RPM | ok
fan4.tach | 10140 RPM | ok
fan5.tach | 10320 RPM | ok
fan6.tach | 10500 RPM | ok
gbeth.temp | 37.20 degrees C | ok
hddbp.temp | 28.80 degrees C | ok
sp.temp | 37.80 degrees C | ok
cpu0.mem0 | Not Readable | ns
cpu0.mem1 | Not Readable | ns
cpu0.mem2 | Not Readable | ns
cpu0.mem3 | Not Readable | ns
cpu1.mem0 | Not Readable | ns
cpu1.mem1 | Not Readable | ns
cpu1.mem2 | Not Readable | ns
cpu1.mem3 | Not Readable | ns
Event Logging | 0x00 | ok
Which is nice. This is actually quite helpful.
The problem is when it doesn’t work. You can’t get to the IPMI through the machine. Which you need to do to configure it.
dmidecode is helpful on a working machine
IPMI Device Information
Interface Type: KCS (Keyboard Control Style)
Specification Version: 1.5
I2C Slave Address: 0x10
NV Storage Device: Not Present
Base Address: 0x0000000000000CA2 (I/O)
Register Spacing: Successive Byte Boundaries
Handle 0x002B, DMI type 127, 4 bytes
Sadly, the ipmi_si module can’t seem to find the ipmi unit, while it reports in bios that it is KCS type (and does report itself as working on the summary screen as well).
Of course, without this, we don’t have an out-of-band control for these units …
Everything else works … just this.