Pfsense - switch High Availability setup

fred974 · January 4, 2023, 11:59am

Hi,

I have the feeling that my setup isn’t right…
At the head of the network, we have 2x pfsense firewall setup in HA . The firewalls have 2x sfp+ ports and 3 1G RJ45 ports

WAN – sfpplus1
LAN – sfpplus2
LAN2 – RJ45-1
pfSync – RJ45-2
Unused – RJ45-3

The LAN from firewall1 is connected to mk_switch1 and the LAN from firewall 2 is connected to mk_switch2.

The hypervisor, storage and backup servers has 2x sfp+ interfaces and are physically connected to both Mikrotik switches (1 port per switch). At the software level, we setup the 2 ports as a bonded interface using the LACP protocol and we setup the mikrotik port interfaces across both switches using MLAG.

I cannot help but think that something is wrong but I cannot figure out what yet…
Do I need to swap the wan/lan interface so i have 2x sfp+ in lag in firewall and then every firewall has 1 lan port in each swith?

brwainer · January 4, 2023, 6:24pm

Right now your only single point of failure is the core switch and whatever is upstream of it. For true redundancy, you would have two ISPs with diverse (not sharing a physical path as much as possible) circuits. This could be a fiber ISP and a Cable ISP, or it could be two DIA circuits that were ordered from the same company as a diverse pair. This is where things start getting expensive, of course, so you have to determine where to draw the line.

Separate from that, I would suggest that you connect each firewall to both switches with LACP/MLAG just like the servers have. This is more of a traffic optimization than a redundancy improvement. Right now the servers are being told they can go to either switch to reach the default gateway, but the active firewall is only present on one of them, requiring traffic that hits the wrong switch to traverse an extra hop.

fred974 · January 5, 2023, 12:02pm

@brwainer thank you for your feedback. What do you think if I were to connect the 2 switches via a dac cable MT1 sfp16 to MT2 sfp16. Would that also work?

brwainer · January 5, 2023, 12:09pm

You already have a direct connection between the two MTs to facilitate the MLAG, right? If not, that’s a mistake - I just assumed it was there but not drawn because it isn’t part of normal packet flow. MLAG, or Cisco’s VPC, or Dell’s VLT… lots of names for it… always needs an SFP+ or better connection between the two separate switch planes.

Having this connection, or any other extra connection between the two MTs, would not replace a full cross-connection between the left OPNSense and right MT and the opposite.

fred974 · January 5, 2023, 12:40pm

No, I didn’t connect the 2x MT switch directly but I can. Here is the front panel of the firewall.

So If I understand what you are saying properly, I need to connect the 2 switches via a dac cable MT1 sfp+16 to MT2 sfp+16 keep opt2 interface for pfSync, move WAN to opt0 interface. bond ax0+ax1 in LACP, connect ax0 to left MT and ax1 connected to right MT.
Then create my LAN and Storage interface as 2 vlan on top of the newly created bond

Is this correct?

brwainer · January 5, 2023, 4:30pm

Yep, sounds correct. When you make the connections between the two MTs, you should review the documentation on MLAG to make sure it is used for that and also that you don’t create a switch loop. I hope you’ve got some form of STP set up across the board.

fred974 · January 5, 2023, 5:17pm

Good point, I will look at this.

fred974 · January 9, 2023, 11:47am

I just had a call with the datacentre and the setup fee to reconfigure the system from 10G uplink to 1G uplink is fairly significant as they said, I requires a hardware and configuration modification.

Now that I have both switches directly connected via dac cable, how much of a performance impact do I have when the traffic that hits the wrong switch has to traverse an extra hop have?

I am aware that is is probably not best practice but is it that bad to leave it as is?

Thank you

brwainer · January 9, 2023, 6:32pm

It’s an extra fraction of a second - perhaps 0.2ms. You can do the math on TCP windows to see what that delay causes in terms of decreased maximum throughput, but its understandably not noticeable.