Here’s a simplified summary… I’m testing speeds internal to my network…
pfSense bare metal router (E5-2650 v2 Xeon, Mellanox CX312A dual 10gb sfp+ nic) connected to a Dell X1052P switch. MikroTik CRS309-1G-8S+IN also connected to the Dell switch. Three other bare metal machines (A, B, C) connected to the MikroTik each with their own Mellanox CX312A.
A and B are on the same VLAN routed and managed (obviously) in pfSense.
C is on another VLAN.
iperf from A to and from pfSense, or B to and from pfSense yields 9gb results of higher.
iperf from C to and from pfSense yields 9gb results or higher.
iperf from A to and from C, or B to and from C, yields 4gb results if I’m lucky.
I expected this though… A/B to pfSense is the same VLAN. C to pfSense is the same VLAN. A/B to C is crossing VLANs so the pfSense needs to route the traffic. As I have a single cable from the pfSense box to the Dell switch, I would expect my bandwidth to be cut in half…
Now… I have an additional open port on the pfSense NIC and in the Dell switch. I’ve read that using LACP won’t double your bandwidth. I also don’t believe the Xeon is a bottleneck either. If I added another cable between the router and switch, is it possible to double the bandwidth on my network with some setup in pfSense? My thoughts would be one could be ingress and the other egress so there is always 10gb available for traffic in either direction. Is something like that even possible?
…I also realize that if something like this is possible, I’ll probably have to replicate it between the Dell and MikroTik switches (which I can, I have the ports available) but I’ll cross that bridge later…
If you link aggregate your 2 - 10Gb connection’s you can have a total bandwidth of 20Gb. That said, you cannot achieve 20Gb from a single stream. You would only be able to saturate at 10Gb. The only time you can saturate 20gb is if you had multiple streams.
Makes sense as I’m not trying to get a 20gb stream, but the max 10gb stream. That being said though, how do I setup LACP, or LAGG, or a bond or something, that would force one of the two cables to be ingress only and the other egress only so that anything coming into the router and going out of the router (like an iperf test across VLANs) would use each connection/cable independently so they can achieve 10gb speeds each way instead of 5gb speeds each way? The bottleneck makes sense in the same way a wifi extender does (you lose half of the available bandwidth). I’m hoping I can add a second cable to get rid of that loss…
It doesn’t function that way. You’ll need to read the man pages to fully understand what you are getting into. Make sure to read the aggregation protocols section.
10Gb is full-duplex only. This means each cable can have 10Gb in both directions at the same time. I know you said you don’t believe it is the Xeon’s but it does sound like your router is the bottleneck when routing between VLANs. Try looking at the CPU usage/load when running the tests. Switching and/or routing packets takes a lot of processing power, which is why dedicated hardware has dedicated chips and not regular CPUs to perform these tasks.
I did not know that. Thank you. I’ve confirmed on the switches, each connection is showing full duplex. As for the CPU… Utilization on the router “spikes” to 13% during a test. Could MTU settings be at play here?
Here’s a snapshot of top while an iperf test is running:
CPU: 1.4% user, 4.1% nice, 0.9% system, 4.7% interrupt, 89.0% idle
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
32 CPUs : 2 package(s) x 8 core(s) x 2 hardware threads
AES-NI CPU Crypto: Yes (active)
IPsec-MB Crypto: Yes (inactive)
QAT Crypto: No
Following as I want to see your ultimate result.
L3 switches are not an option?
Unfortunately, L3 isn’t an option for me. I’ve got a Dell X1052p (L2+) so no go there… My other switch is the MikroTik CRS309-1G-8S+in and I think that can work as L3 when running RouterOS but it’s not powerful enough to route 10gb traffic. I’m running SwitchOS on it to get rid of that bottleneck but I think that limits it to L2 functionality.
This is relatively new to me though (10gb networking, not so much networking in general) so if you know something I don’t, or I’m missing something simple/stupid, I welcome the correction.
How much overhead does running multiple
VLANs on the same cable add? If full duplex is working (assuming it is), I still should be able to get close to 10gb in each direction even if each direction is on a different VLAN correct?
Based on the information above, it doesn’t appear the CPU is the bottleneck, but I don’t know what else it might be as I don’t know how the routing is performed from a coding and hardware perspective.
Having VLANs across the same bit of cable will make no difference to how much data the cable can carry. The only difference would be that you can’t have all VLANs running at 10Gb on the same cable, that is to say they all share the 10Gb pipe. If you think of it like a road that can take 10000 cars an hour, it doesn’t matter if all cars red or blue or a mixture, you can still only have 10000 and he road has no idea. The VLANs being the colour of the car in that analogy.
The overhead comes from when you route from one VLAN to another. As something has to inspect every packet, check if it is allowed to go from A to B and then change the VLAN tag before sending it on its way.
Still haven’t solved it but wanted to drop in with a few updates incase someone in the future runs into the same issues…
tl;dr My very unscientific guess is that the drivers for FreeBSD and these cards can’t route at line speed. They’re still fast very though. Just not as fast as if I was running Linux.
Swapped in an Intel X520-DA2 nic for the Mellanox CX312A ConnectX-3 figuring that the Intel drivers would be a little more optimized (I guess everyone everywhere says Intel cards are better for FreeBSD). First issue was that the inter-VLAN traffic speeds dropped from 4gb down to 2-3gb and host A/B to pfSense test in my post dropped to 4gb speed.
I then thought maybe the multiple VLANs on the nic was the issue. Hooked another machine with a ConnectX-3 card directly into the second port on the Intel that I wasn’t using on the firewall. Configured them both for static IPs and ran iperf. 3-4gb at best. Hmmm…
Popped the Mellanox card back into the system, repeated the test in my original post. Same results. Now I repeated the test in #2. ~4-5gb speed on a direct connection via a DAC cable. Odd since I was getting close to line speed when going through a switch before… Regardless, since this dedicated port is not configured for vlans, it does not appear that the VLAN overhead is causing the “slow” speed.
Both cards, prior to me rebuilding my pfSense box to use them, were installed in Linux machines (technically two Proxmox hosts with the cards passed through to Linux guests, one in each so the traffic would actually leave the box) and both cards were able to get over 9gb when talking to each other though the network via a switch.
So my uneducated, very much a guess, non-scientific thought is that the drivers for these cards aren’t capable of line speeds on FreeBSD since the exact cards and cables do get line speeds when running Linux. It’s not an issue for my setup because 5gb speeds are still more than anything I have can push. But like most homelabbers, I’m trying to eek out as much performance as possible… Even if I can’t use it.
If anyone has experience with the Intel X520-DA2 or Mellanox CX312A ConnectX-3 cards getting line speeds out of pfSense, please let me know. I’m curious if you had to do anything to get those speeds.