Issue with Chelsio T320 10G nic and xcp-ng (Solved)

TLDR: The driver used by xcp-ng for the Chelsio T320 10G nic has a bug when offloading is enabled and routing through to a guest VM. This doesn’t effect using a xcp-ng storage repository, but shows up when mounting a share under a guest VM across the nic.

I wanted to share a problem I discovered while testing a Chelsio T320 10G nic installed into a Dell r710 running xcp-ng. I followed Tom’s YouTube videos on getting everything setup (I mean everything! Freenas, xcp-ng, 10G cards, the whole shebang) and it appeared to be working just like he demonstrated. I was able to run iperf3 between the xcp-ng server in dom0 to the freenas server and get 9.98 Gb/s each way. Awesome! I then setup iscsi and nfs shares and tested them as storage repositories by installing a guest VM and using phoronix. I got very similar performance in read/write as Tom’s video.

The issue I had was when I tried to connect between the freenas server and a guest VM. Using iperf3 I would see 10 Gb/s performance in one direction, but less than 100 Mb/s in the other direction. I tried Centos, Debian and Ubuntu VM’s and had the same issue on all 3. I googled around and found this guide to tweak the network system settings. Still pretty terrible performance, nothing close to 10 Gb/s. I spent the better part of two days trying various hacks but nothing worked. The closest I could get was running iperf3 with lots of parallel client streams, but that was very finiky and depended on the window size in iperf3 as well as the system. Plus it had zero effect on the actual usage of the network. cifs or nfs mounted shares still were basically unusable.

The I reread that guide I linked above and came across this section stating that network offloading was not officially supported due to some driver bugs. I was about to give up when I thought what they hell, maybe it will fix the issue. I found an old Mellanox manual that listed the offloading settings. I ended up disabling all the offloading in the driver using

ethtool -K eth4 gro off
ethtool -K eth4 gso off
ethtool -K eth4 rxvlan off
ethtool -K eth4 txvlan off
ethtool -K eth4 tso off

and tried iperf3 again. Huzzah! Still 9.9 Gb/s in the good direction, but now 7.5 Gb/s in what was the broken direction. Not perfect, but a whole lot better. Now when I run the phoronix iozone tests on an nfs share mounted under a VM, it actually works like it is supposed to. So it appears that the driver used by xcp-ng for the Chelsio is buggy when routing to a guest VM using offloading.

3 Likes

Neat! I will have to give that a try.

2 Likes

Thanks for sharing! Still weird your only getting 7.9gb, What NIC/Card are you using on the R710? The T320 as well?

1 Like

It was the Chelsio T320 on both sides. Direct connection using a DAC cable.

I imagine there are still some driver bugs causing the problems somewhere in the xcp-ng software stack. But once I got the 80% solution, I moved on. I will probably circle back later and see if I can’t get it all the way to 100%.

1 Like

Yip my problem is similar in the sense I am running a pfsense in a vm (1G WAN and out 10G pipe into unifi switch) current states - 7 gb one way and 40 meg the other , I have also read that autotunables in freenas can also interfere with your up and down thruput , and bios settings for the pci needs to be set manually to gen1 and not left in auto , but hey this is where we come to pull out our hair , and become alcoholics and possibly get a viable solution , will give it a try once I sober up thanks … quick question I see you are configuring eth4 - I assume on the xcp-ng 10G interface , so it will be same interface as my pfsense firewall vm tied inside xcp-ng :slight_smile: ps if anyone has any valuable info to share please do … have a great one and keep smiling …