big memory machines

Haven’t finished debugging this unit yet. Thought you might like to see top info. These are physical CPUs BTW, not SMT.

top - 09:21:29 up 3 min,  2 users,  load average: 0.22, 0.21, 0.09
Tasks: 219 total,   1 running, 218 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.7%us,  0.3%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us, 13.2%sy,  0.0%ni, 86.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu21 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu24 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu25 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu26 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu27 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu28 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu29 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu30 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu31 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  <strong>529366036k</strong> total,  9816336k used, <strong>519549700k</strong> free,        0k buffers
Swap:        0k total,        0k used,        0k free,    70116k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 2126 root      39  19     0    0    0 S   13  0.0   0:08.42 kipmi0             
 1882 root      20   0 15780  736  520 S    0  0.0   0:00.17 irqbalance         
 2388 root      20   0 79340 3760 2944 S    0  0.0   0:00.05 sshd               
 2467 root      20   0 19476 1520 1068 R    0  0.0   0:00.05 top                
    1 root      20   0 24008 2196 1340 S    0  0.0   0:07.03 init               
    2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd           
    3 root      20   0     0    0    0 S    0  0.0   0:00.05 ksoftirqd/0        
    4 root      20   0     0    0    0 S    0  0.0   0:00.00 kworker/0:0        
    5 root      20   0     0    0    0 S    0  0.0   0:00.14 kworker/u:0        

ahhhh

It really has 1TB, probably need some boot options or some other bits to get it to see all the ram.

Viewed 32124 times by 4833 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail

11 thoughts on “big memory machines

  1. @kirjoittaessani

    Sadly, something like 1/2 the memory isn’t showing up. I’ll have to run into the lab today and test the RAM. I am guessing a mixture of bad dimms and memory cards. Ugh.

  2. As I said in an earlier comment in case you missed it is there are some pretty serious, in my opinion, issues with anyone reading /proc on kernels from 2.6.32 forward and I wrote it up here – http://collectl.sourceforge.net/SlowProc.html

    If this includes your system perhaps you can try out my ‘strace -c’ test and confirm you’re seeing this issue too.

    -mark

  3. @Mark

    Good catch there … I am wondering if this is what I’ve been running into with Collectl on our 2.6.32 kernels.

    Ok … this smells like a /proc – NUMA problem. That the CPUs handling the /proc interface could be different, so its possible that reads are causing all sorts of joyous access issues.

  4. re newer kernels – I believe it still is a problem. Nevertheless it would be good to test yourself if you have access to a many-core box.

    joe – it would be very interesting to see if this is what you’re bumping into. Can you try some of the tests I outlined on that web page?

    I too thought it was a numa issue but I think it’s more of an issue handling all the locking on the different memory sections one needs to traverse with a lot of cores. While it turns out you can’t have a lot of cores without a lot of sockets and hence NUMA, it’s not really the numa code that is doing this. At least that’s my understanding.

    -mark

  5. @marc – not sure you know, but the Red Hat bugzilla was locked down a few months ago to prevent access by non-subscribed people to bugs, apparently for “security reasons” (my understanding based on what RH told me happened to a bug of ours). So the BZ you link to from your collectl page is not viewable by anyone else I’m afraid.

    Is there a discussion on LKML about this kernel regression?

Comments are closed.