Ubuntu kernels: is anyone paying attention???

I have noticed this now on my laptop, on JackRabbit, on a few other systems. The Ubuntu kernels are thrashing with context switches. 4000-5000 or so per second, and fast machines are rendered sluggish.
So we build our own. Did that for Ubuntu thus far, and it has been good. Context switches per second down around 100 or so at idle. Where they should be.

I just wonder if anyone at Canonical is paying attention to this. I will file a bug report later on.
As a rule of thumb, if your kernel is thrashing, driving your CPU temperature up when you are idling, you are probably, very likely, doing something horribly wrong. Best guess would be one of their patches hosed the system somehow. This was with both generic and rt kernels.
I used Gutsy beta 1 as the test bed, as I wanted my compiz back. Had it on SuSE, want it on Ubuntu. For reasons I didn’t grasp, it was horribly completely broken in 7.04. It sorta kinda works in 7.10. Takes a little finesse.
This almost makes me want to try SuSE 10.3 on the laptop. With Ubuntu, most everything just works, though on this laptop, the new nVidia chips and the intel wireless driver didn’t in 7.04, nor did the sound. I fixed all of those with updated drivers and a custom kernel build. This kernel is the one which is very well behaved. Loading 7.10 on the system, wiping the old system, and the kernel and system are sluggish. Huge number of context swaps.
On our other systems with Ubuntu, without our kernel, they are sluggish. With our kernel they are peppy.
Someone at Canonical is not building good kernels.

2 thoughts on “Ubuntu kernels: is anyone paying attention???”

  1. Have you compared (or posted) your kernel configs? The Linux PowerTop project has flushed out are a number of bad drivers and settings which cause frequent polling; I suspect that Ubuntu will pick these up over time but it might be worth seeing if e.g. there’s a reasonably low-pain change to bring that idle use down.

  2. @Chris:
    Not yet, will do that soon. Building 2.6.23 debs right now with NFS patches, updated Areca and AoE, and some other goodness for testing/use.
    From what I can see, this could be a driver issue. In the laptop unit, when the network is hit hard, the csw count goes up rapidly. I am wondering if some change was done so that it is always polling the network (this is a tg3 driver). Tg3 is generally a terrible driver, the bcm* was far better (fewer context switches, lower interrupt usage). Unfortunately, it is built onto the motherboard. I try to avoid broadcom NICs whenever possible, but in this case, it is hard to ignore as it is on a laptop.
    Right now while idling (running some network applications that are polling slowly), we are getting about 300-600 csw/sec. This is ok. When there is serious network traffic, we get closer to 2500-4000. Which is in line with what we see under heavy load. This is my older kernel ( Scalable: includes NFS, AoE, Areca, … patches).
    The Gutsy kernel has a sustained 5000+ csw no matter what I do. My oild kernel ontop of the Gutsy install shows generally good behavior, and is quite snappy. The Gutsy kernel atop the Gutsy install does not feel this way. The laptop feels sluggish with the Gutsy kernel.
    Whats odd is that they are both 2.6.22.*, though my kernel is .6 and theirs is later.
    Is there a kernel “top” utility that can be used to ferret this out?

Comments are closed.