I am hoping to get some configuration help on a nagging issue I have with my mixed speed LAN.
My key problematic equipment includes a Netgate 6100 running pfSense and a Unifi 24-port PoE Enterprise switch. My problematic PCs, a mix of Linux and Windows all with Intel NICs, are connected to the Unifi switch via RJ45 at 1GBe.
The setup I am trying consists of 1.2 GBe WAN (Comcast) entering on a 2GBe Netgate RJ45 port. I am attempting to link the Netgate and Unifi using an SFP+ DAC negotiated at 10GBe. When doing so, I can run iperf from any of the PCs over the WAN, or to the Netgate, and using UDP, I get near the expected 1GBe speed on the PCs. However, when running a TCP connection – either the iperfs or a WAN download or a WAN speed test to a large number of standard sped test sites – I never get better than 240Mbs on the downloads to the PCs.
When troubleshooting, I attempted to force the SFP+ link to 2.5GBe. I also switched DAC cables (approved by Negate and a 2nd bought from Ubiquity). I also changed the Netgate to Unifi link to one of the 2.5 GBe RJ45 ports. All of the combinations continue to result in the PCs TCP download speed at approximately one-forth the expected 1GBe NIC. The ONLY thing that restores the PCs to near 1GBe download speed is to force EITHER the 2.5GBe RJ45 connection OR the SPF+ DAC ports to 1GBe. When limiting the Negate-to-Unifi LAN link to 1GEb, the 1GBe PCs perform fine. I can use either the SFP+ DAC or the RJ45 ports forced at 1GBe between the Netgate and Unifi and all is fine for the PCs. However, I obviously lose all of the anticipated benefit of the 1.2 GBe WAN and the 10GBe LAN for other server needs.
If I plug any of the Windows or Linux PCs via RJ45 into a Netgate 2.5GBe port, auto negotiation connects then at 1GBe and I get full 1GBe TCP download speeds.
My trial-and-error has proven that my problem is NOT the SPF++ DAC. And I have also proven that the 10GBe LAN link is not the cause since I get the same results at 10GBe or 2.5 GBe between the Netgate and Unifi.
Can any of you suggest what I am doing wrong, or is this a side-affect that any 1GBe host needs to live with when connected to Unifi (since the direct Netgate connection is fine when by-passing Unifi switch). Are there TCP windowing settings that are required for Windows and Linux? Or flow control settings on the Unifi? Or traffic shaping on the pfSense? I have exhausted my knowledge, watching any appropriate video from Tom and others, and all trial-and-error changes. Any suggestions would be MUCH appreciated, or explain why I need to live with the results or instead return everything to 1GBe throughout the LAN.
When you are running your tests how many streams are are you running?
Just a single stream (default) on the iperf tests. For the various speed test sites (e.g. speed.com, Ookla, Fast, Xfinity), you do not specify the streams from their website test, so I assume these just use defaults for the TCP connection. Both iperf and ALL of the various speed test sites result in the approximate 240-250 Mbs results. And when I connect to the pfSense/Netgate directly, I always get at least 1.2GBe or more when visiting all of the speed test sites.
Thank you for the quick question/clarification.
I wanted to add a bit more testing results. I restored the 10Gbe SPF+ DAC connection between my pfSense (Netgate 6100) and Unifi switch. I also obtained a 2.5 Gbe PC for testing (running Linux). The PC negotiated to 2.5 Gbe with the Unifi port, and test results have been fantastic. I am exceeding my rated/paid 1.2 Gbe WAN link by a decent percentage, typically getting 1.4 to 1.7 Gbe in speed tests. So my issue is defintely 1Gbe NICs (Windows and Linux makes no difference). Smoking gun says the 10Gbe uplink between pfSense and Unifi overruns the TCP windowing capabilities of the PCs with the 1Gbe cards.
Any thoughts or possible next steps for testing. Seems like the only thing I haven’t tried is pfSense traffic shaping with filters to try to limit the bandwidth to 1Gbe which the PCs can obviously handle. This will not help with intra-VLAN traffic, but anything passing through pfSense (Internet gateway, inter-VLAN routing) might be helped by this seeming hack.
Any other ideas?
I wanted to follow-up with my solution on this nagging topic for the benefit of anyone who may have been following, in particular for @UK_TechDad who originated the thread.
My ultimate solution was to follow one of @LTS_Tom past videos on bufferbloat: https://www.youtube.com/watch?v=iXqExAALzR8. My various layers of networking equipment had some widely different speeds between each link. With my 1.2 Gbps WAN down-link and 35 Mbps WAN up-link speeds, the radically different speeds on this asymmetric link caused the TCP ACKs to get totally backed up, which eventually intensified and caused the TCP session to nose-dive with dropped packets and other protocol hell. UDP was never an issue, and I should have been conscious of the TCP ACK and the WAN asymmetry as the smoking gun.
I followed Tom’s video apart from setting my own names and adjusting for my WAN link speeds in both directions. I also watched an old video from Netgate (https://www.youtube.com/watch?v=o8nL81DzTlU&t=542s) starting at the 6:17 mark. Further reading bufferbloat at Bufferbloat.net - Bufferbloat.net gave me better understanding, supplemented by the pfSense Traffic Shaper docs: (Traffic Shaper — What the Traffic Shaper can do for a Network | pfSense Documentation).
The best testing site to use is Waveform’s: https://www.waveform.com/tools/bufferbloat. All of the many other speed test sites result is widely differing results, and did not really zero in on latency increases above the baseline on both download and upload links. I used this site each time I made tweaks to the down-link and up-link speeds settings in the pfSense Traffic Shaper, each time using a new private browsing tab to force a new TCP connection for the test. My target was maximum speed at or above my ISP rated speeds while keeping the latency at (or near) +0 on the down-link and up-link above the baseline. After getting the CoDel bandwidth settings “just right” for my primary Ethernet-attached test PC I then checked if there were any adverse impacts elsewhere. Every other Ethernet-attached device was fine given the NICs for the device. Per the Waveform instructions, stay away from WiFi testing since it is not pertinent, will screw with your thought clarity, and is not a factor for the root cause of the problem.
@UK_TechDad: I hope this ends up helping in some way if you are still experiencing problems. I wanted to keep digging … and learning … and this gave me at least “rookie” understanding of an area of networking I knew nothing about. Long grind, but fantastic learning experience.
Interesting that the title of the thread, at least in my case, had NOTHING to do with the root cause. The increase in the speed between pfSense router and Unifi switch beyond the WAN and PC NIC speeds simply made the problem get much worse.
Happy holiday season and the best for 2024 for everyone how may be reading this in late December 2023.
@TNewshel you absolute legend, I thought this thread was going to die without resolution, hopefully this is going to really help. If nothing else. It’s given me some holiday reading to do and videos to watch.
Thanks so much for coming back on this topic as it’s a real issue sometimes in the building I look after.
Now I have suricata and this as a topic for learning next year
Happy holiday season to all. Thanks @TNewshel and @xMAXIMUSx for the comments
Finally thank you so so much @LTS_Tom for everything you put out and continue to help in explaining to the masses.
Holiday mic drop
James: Let me know how you eventually make out. I can share more specifics with you in the future if needed/helpful. Good luck, and happy learning!
I TOTALLY echo your sentiments about Tom.