Fixing pausing Nehalem/Westmere units

иконографияSome Nehalem and Westmere units have … er … interesting unintended features … yeah, thats the politically correct way to say it. We like Intel and their products (and we’ve liked AMD in the past and their products). But we gotta call this one.
As you watch dstat output, you see these occasional … hangs … for a few seconds. As if someone is monkeying with the clock.
And that is, to a degree what appears to be happening. The TSC (time stamp counter) as a clock source isn’t being stable. So you need another mechanism to stabilize it. You generally have 3 options, tsc, hpet, and acpi_pm.
So we’ve found that a simple
echo "acpi_pm" > /sys/devices/system/clocksource/clocksource0/current_clocksource
does a pretty good job of fixing some of the weird latency. But under heavy loads, we see more latency.
Honestly, I think the problem is in silicon. Newer revisions of chipsets have exhibited it more clearly than the older sets. Very annoying.
Unfortunately as indicated, it shows up under load. Such as when work has to get done during an interrupt service routine, which is blocked for some reason while interrupts are turned off. This shouldn’t be … ISRs should do as little work as possible, and never spin/sleep. Especially never sleep/spin with interrupts turned off. Like clock timers.
This is what’s happening.
So, how to fix it?

2 thoughts on “Fixing pausing Nehalem/Westmere units”

1. I recently did some FhGFS profiling using the ‘perf’ tool and it turned out lots of CPU time was due to acpi timer requests. Just by switching to HPET we could reduce CPU load by 30%. TSC works even better, but is marked unstable on our benchmarking system. The default was acpi_pm, as hpet was not enabled by the bios and I had to force the kernel to use it.
So if you are really going to use acpi_pm you probably carefully want to benchmark negative effects…

2. @Bernd
We care less about benchmark impacts and more upon the real workloads. tsc varies in stability from boot to boot, which suggests something amiss with the silicon. HPET is good, but we did see timer tick misses even with that. The big issue though, appears to be how IRQs are routed (or misrouted) by the system. We’ve been able to reliably get rid of the timer tick and pausing issue by using the aforementioned boot options. Rather odd that this should work, but new silicon introduces new issues.