Not what I was hoping for. I may explain more of what I am doing later (less interesting than why I am doing it), but suffice it to say that I’ve got a machine I’ve turned into a VM/container box, so I can build something I need to build.
This box has a large RAID6 for storage. Spinning disk. Fairly well optimized, I get good performance out of it. The box has ample CPU, and ample memory.
The VM bulk storage points over the the spinning disk RAID6, not the SSD RAID10.
I noted a failing drive, so I ejected it and swapped it out for a working one. RAID rebuild started, and now I’ve got another couple of hours before it finishes. 6 VMs are consuming maybe 25% of the CPU cycles when busy, and about 25% of the RAM in total. The machine is otherwise idle.
And when I log into one of the VMs, I am getting dramatic pauses, while there is no real load going on. Nothing in the process table. Yet the load average is wound up a little, which usually happens when IO is paused.
Sure enough … this looks like what is happening. I am going to explore these code paths somewhat more. Fairly modern 4.4.x kernel, so its not likely a long looming bug of the 3.10/3.16 variety.