If you don’t know what I am talking about here, that’s fine. I’ll assume you don’t do hardware, or you call someone else when there is a hardware problem.
If you think “well gee, don’t we have lspci? so why do we need this?” then you probably have not really tried to use lspci to find this information, or didn’t know it was available.
Ok … what I am talking about.
When a PCIe bus comes up, the connections negotiate with the PCIe hub. The negotiate width (e.g. how many lanes of PCIe will they consume), speed (e.g. signalling speed in terms of GT/s), interrupts, etc. The PCIe hub presents this information to the OS, though in some cases, OSes like Linux choose to enumerate/walk the PCIe tree themselves … because … you know … BIOS bugs.
Ok. So these devices all autonegotiate as part of their initialization. Every now and then, you get a system where a card autonegotiates speeds or widths lower than expected. The driver generally provides the information on what it is capable of, and the PCIe hub, or OS structure, tells you the actual state.
Why is this important … you might rightly ask ?
So you have two machines. They connect over a simple network. The network speed is lower than the PCIe speed when the unit is operating at full capability. One simple estimate of the maximum possible speed of a PCIe system is take the number of GT/s and multiply it by the width. Divide that by 10. That is your approximate bandwidth in GB/s.
So your two machines have fast network cards (note this also works for HBAs … heck … everything … though … be careful about the power control systems, as they may mess with some of these things). You start using iperf to generate traffic between the two machines. And you see it is way below where you expect.
So, you start looking for why this is the case.
Latest drivers: check
Up to date kernel: check
Switch behaving well: check
Hmmm …. Something is amiss.
Then you try between other machines. Every other machine to the non-suspect machine is giving you reasonable numbers.
The suspect machine is giving you crappy numbers to/from it.
In the network scenario, you also see many errors/buffer overruns. Which means that the kernel can’t empty/fill buffers fast enough. Which suggests some odd speed issue.
Ok … where do you look, and what do you look for?
Pat yourself on the back if your hand shot up and you said, with confidence, ‘lspci’. Or parsing the /sys/… tree by hand. Either will work. Lets focus on lspci for the moment.
Ok, great. Now what information within lspci output do you want, and which options do you use?
They suggest -m or -mm for machine parseable output.
I am going to avoid those options. Try them, and see why for yourself.
You see, to get the juicy bits you need, you will need to give 3 v’s. -vvv . And to get a little more info, add a -kb to get driver and other info.
Now, look at that joyus output. Again, what info do you need?
Look at LnkCap: and LnkSta:
That’s what you need.
Wouldn’t it be nice if this were output in a nice simple, tabular form … so you could … I dunno … see your problem right away?
Well, your long wait is over! For only $19.95, and a quick trip to github.com, you too can grab all this info incredibly quickly. Don’t believe me? Well then, have a gander:
landman@leela:~/work/development/pcilist$ sudo ./pcilist.pl PCIid MaxWidth ActWidth MaxSpeed ActSpeed driver description 00:00.0 4 0 5 Intel Corporation Haswell-E DMI2 (rev 02) 00:01.0 8 0 8 2.5 pcieport Intel Corporation Haswell-E PCI Express Root Port 1 (rev 02) (prog-if 00 [Normal decode]) 00:02.0 8 1 8 2.5 pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode]) 00:02.2 8 8 8 8 pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode]) 00:03.0 8 0 8 2.5 pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode]) 00:03.2 8 8 8 5 pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode]) 00:1c.0 1 0 5 2.5 pcieport Intel Corporation Wellsburg PCI Express Root Port #1 (rev d5) (prog-if 00 [Normal decode]) 00:1c.4 4 1 5 2.5 pcieport Intel Corporation Wellsburg PCI Express Root Port #5 (rev d5) (prog-if 00 [Normal decode]) 02:00.0 1 1 2.5 2.5 snd_hda_intel Creative Labs SB Recon3D (rev 01) 03:00.0 8 8 8 8 mpt3sas LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05) 05:00.0 8 8 5 5 ixgbe Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) 05:00.1 8 8 5 5 ixgbe Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) 07:00.0 1 1 2.5 2.5 ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode]) 80:02.0 8 8 8 8 pcieport Intel Corporation Haswell-E PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode]) 80:03.0 16 16 8 2.5 pcieport Intel Corporation Haswell-E PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode]) 82:00.0 16 16 8 2.5 nvidia NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2) (prog-if 00 [VGA controller]) 82:00.1 16 16 8 2.5 snd_hda_intel NVIDIA Corporation Device 0fbc (rev a1)
Notice here how the NVidia card throttled down. When you start using it actively, it throttles up in speed.
But, if you have a nice 40GbE card, say an mlx4_en based card, and you see 5GT/s and x4 on the width, that gets you to about 2GB/s maximum. So you’ll see somewhat less than that on your network.
This is what I saw today. And I wanted to make it easy to spot going forward.
Viewed 128208 times by 7990 viewers