Long story but it was a time sensitive POC bug. I like the switch I was using, but we needed this up ASAP. Customer was waiting.
So I yanked all the 40GbE cards from the servers, put in multiport 10GbE, set up 802.3ad LAGs.
Then moved to the Arista in the lab (great switch BTW). Its been years since I set one up, so out came the manual. Read up on setting up the LAGs and port channels … I had forgotten why I liked using them so much. So bloody easy.
Start the 8 client LAGs, the 2 server LAGs, turn on the PFS, run tests …
… and no asymmetric bug. Its gone.
Run some sanity checks, yes, we are getting good performance, though I know I am going to be write limited (LAGs are not faster pipes, but they can help aggregate many threads of smaller pipes).
Overall performance is within 10% of theoretical max on this HW with this network, so I am fine with it. Hope the customer is as well.