TruenasCORE and Intel X520-DA2 10gig absolutely horrible performance

Hi all

I’m slowly moving my server side of things over to 10gig, and one of the first ones that I tackled with my Truenas box.

To give some back ground, it’s a Dell T710 box, with a bunch of 12TB sata drives and 192GB of ram. Covering the elephant in the room, the pools are not set up correctly. I set them up before I knew any better so the 8 drives are in 2 physical hardware Raid5’s and then the ZFS is on top of that… I know it’s horrible, and I have to change it but I haven’t gotten around to where to put the 80+ TB of stuff, in order to rebuild it (first world problems I guess lol).

Anyway, the Box was on a 1gig connection, on that connection I get sustained 80-90MB/s transfers to my desktop also on 1gig which is what I would expect , connected to my cisco switch.

When I replaced it with the X520-DA2, and using FS.com DAC cable and run it into the ubiquiti switch the transfers to the same far end box (my desktop on 1gig port) drop from the expected 80-90MB/s down to 300-700Kb/s , 1MB/ if I’m super lucky and they bounce around up and down and occasionally stop and then continue.

The card is plugged into an 8X slot , but in either way I’m not trying to get the full 10gig , but having a the drop to under 1MB/s is just like going back to dial up lol…

I’ve tried a bunch of different tuning settings so far with no improvement.
An Iperf run from the client side shows this .

F:\iperf-3.1.3-win64>iperf3.exe -c 10.10.10.181 -P 1
Connecting to host nas1, port 5201
[ 4] local 10.10.10.20 port 55629 connected to 10.10.10.181 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 106 MBytes 888 Mbits/sec
[ 4] 1.00-2.00 sec 98.9 MBytes 829 Mbits/sec
[ 4] 2.00-3.00 sec 96.0 MBytes 805 Mbits/sec
[ 4] 3.00-4.00 sec 100 MBytes 840 Mbits/sec
[ 4] 4.00-5.00 sec 104 MBytes 874 Mbits/sec
[ 4] 5.00-6.00 sec 91.8 MBytes 769 Mbits/sec
[ 4] 6.00-7.00 sec 92.9 MBytes 780 Mbits/sec
[ 4] 7.00-8.00 sec 101 MBytes 845 Mbits/sec
[ 4] 8.00-9.00 sec 91.0 MBytes 763 Mbits/sec
[ 4] 9.00-10.00 sec 99.9 MBytes 838 Mbits/sec


[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 981 MBytes 823 Mbits/sec sender
[ 4] 0.00-10.00 sec 981 MBytes 823 Mbits/sec receiver

from the server side

Server listening on 5201 (test #1)

Accepted connection from 10.10.10.20, port 55628
[ 5] local 10.10.10.181 port 5201 connected to 10.10.10.20 port 55629
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 106 MBytes 886 Mbits/sec
[ 5] 1.00-2.00 sec 98.5 MBytes 826 Mbits/sec
[ 5] 2.00-3.00 sec 96.1 MBytes 806 Mbits/sec
[ 5] 3.00-4.00 sec 100 MBytes 840 Mbits/sec
[ 5] 4.00-5.00 sec 104 MBytes 873 Mbits/sec
[ 5] 5.00-6.00 sec 91.7 MBytes 769 Mbits/sec
[ 5] 6.00-7.00 sec 93.1 MBytes 781 Mbits/sec
[ 5] 7.00-8.00 sec 101 MBytes 844 Mbits/sec
[ 5] 8.00-9.00 sec 91.1 MBytes 764 Mbits/sec
[ 5] 9.00-10.00 sec 99.8 MBytes 837 Mbits/sec
[ 5] 10.00-10.00 sec 384 KBytes 953 Mbits/sec

and some windows pictures

on 10gig
image

but on 1gig
image

and I had a powershell script that I use to backup to an SSD and the same difference is there too

10gig

Speed : 3,191,324 Bytes/sec.
Speed : 182.609 MegaBytes/min.

1GIG

Speed : 73,373,607 Bytes/sec.
Speed : 4,198.472 MegaBytes/min.

Everything I saw on line seems to point to people not getting like the upper end of speed band but very little if anything that is showing this level of performance. I don’t expect to be able to squeeze every last bit out of the 10gig , not looking for that in any way.

I just donno…

Tks to all for your time and comments :slight_smile:

An additional data point. Writing to the Truenas box is fine Sustained writing of 110MB/s from the same 1gig desktop to the 10gig NAS.

so it’s only the read side that seems to be affected.

Are you disabling the 1G connection when testing the 10G? Could be a problem of them being on the same subnet and not routing things properly.

It’s a good thought Tom, and I should have mentioned that in the initial post, yes I actually pulled the cables and “deleted” the interfaces in Truenas just to be extra sure.

The only interface on the box that still exists in the same subnet is the iDRAC. I could try pulling it too just for giggles but i can’t see that being the issue. I’ll try disabling it on the switch and run another test to see.

yup no change with DRAC port disabled on the switch.

for what it’s worth, some more background.
I had initially set this up as a failover LAGG between two switches, on a trunk port.
I thought that might have been causing the issue for some reason, so I pulled everything, down to 1 port, no trunk, as basic as I could make it… didn’t seem to make any difference.

I am not seeing this issue with the X520-DA1 (single port) on Supermicro X10 series main boards. I can run several computers at the full 1gb connection (simultaneously) until the drives saturate.

Things I would check:

Pull the CPU that is associated with that slot, or move the NIC to a different slot that is handled by a second processor (if possible).

If pulling the CPU, check for tarnished spots and clean. Also check for bent pins in the socket, I just fixed an issue with one of my servers due to this, the entire slot for an X520 card wasn’t working correctly. Pulled both CPU, blew some gunk out of the socket that handled the one slot that I have, and flip flopped the CPUS to make sure it wasn’t a processor failure. Now it works.

Check the riser card to make sure nothing is shorting and broken solder joints. Make sure the fingers are clean and the slot is clean.

If you have optical modules, give it a shot, but the DAC should be OK.

Interesting thought but I think I just found my issue.

I welcome anyone’s thoughts on this…

I tried looking at the slots and cleaning them, same with the Proc’s… I switched the card to a X16 slot to just see… no change…

I then tried something out of sheer frustration. I pulled the DAC cable and put in 2 1gig RJ45 SFP+ modules (that are supposed to be for ubiquiti) just to see what would happen… Everything came up at 1gig and worked perfectly. so I dug further and found the cli commands for the ubiquiti switch and with the DAC cable installed I count input errors , CRC’s and sometimes output errors and babbles. almost constantly…

I have tried 6 of the 8 DAC cables that I had custom ordered from FS (custom as in Ubiquiti at one end and Intel at the other) . they ALL are counting errors every last friggin one of them. I could see 1 bad cable but 6 ? I can only assume the other 2 are also bad. I even though ok they say they are custom ends for each vendor maybe they were miss labeled, so i tried turning one around… nope… still just as bad…

Now I’m worried about the other ubiquiti to supermicro ones I got for my pfsense boxes that I haven’t even touched yet…

I have heard really good things about FS for cables but so far I’m not impressed.

I just can’t believe all of them are bad… like how…

Hmmm… I guess I better take a close look at my 10Gtek DAC cables, I’m working on getting my XCP-NG system at work set up on 10 gig with DAC and a Miktrotik CRS 309-1-8s switch. I really want this on DAC cables, but I did buy what I need to get most of the servers on optical if needed. I’ve been buying these for my home lab Genuine Cisco SFP-10G-SR V03 10GBASE-SR SFP+ Transceiver Module 10-2415-03 | eBay and they seem to work OK, I’ve been buying the round style jumpers on Amazon for under $10 each so it’s not a very expensive way to get 10g working, just more power draw.

I doubt it’s “bad” cables but rather the devices not configuring the interface parameters to support that DAC cable. Optical modules and passive DaCS require very different amplitude and emphasis settings. Suspect the interface isn’t autonegotiating this so you have Poor default settings. I’m not sure if ubiquity or the NIC allow you to tune the pre post and main taps on that 10g interface.

OK, my problems came back today… It seems like if I connect another network by way of the 1gb copper connections on the back of the server, it causes problems on my 10gb card. Not sure if it is only the web GUI, or with the shares. I’ve also had this server up and down several times today so it could just be a temporary thing that I won’t find until tomorrow. The 10gb network is a 172.xx.xx.xx and the copper a 192.168.xx.xx and two physically separate switches (for now), the only common connection is my PFSense router letting me talk to the 172 network.

Also a correction, I’m using X520-SR1 cards, not the DA1 I mentioned above. Going to let this sit for a bit before I configure the XCP-NG SR in case I need to use a different server.

Can you statically set the negotiation to 10G on your switchport?

So I’m finally getting around to updating this story.

The issue turned out to be the length of the cables. OR so FS determined. We did many tests, and I actually tried one of the cables that I ordered to go form the supermicro box and the ubiquiti switch and it was stable.

so in the end FS said the 7m cables were loosing something they suspect due to the length… so They very nicely offered to switch them out for 7m AOC cables. When we did that everything has been stable since.

While it took a while FS did come to bat for the issue and were helpful so I can’t say much bad about them.

So looks like one issue down on to the next one :slight_smile: