ECS-Aggregation MC-LAG and CARP Multicast

tmacalp · April 3, 2026, 6:11pm

We recently bought a pair of UniFi ECS-Aggregation switches to use as core switches with MC-LAG, now that 3.0.8 firmware (hopefully) addresses many of the stability issues of previous versions. We also use a pair of 1537 pfSense+ gateways set up in HA.

We have been testing the ECS-Aggregation switches at the edge of our network, but haven’t yet moved them to replace our core switches. We’re looking to MC-LAG each pfSense+ box to the new switches, but are concerned about how CARP will be handled by the switches. The current firmware does not pass multicast traffic across the peer link, which is what CARP/VRRP rely on by default. Would this cause issues with missed CARP heartbeats?

Does anyone else successfully use pfSense CARP with ECS-Aggregation switches or other vendor switches with MC-LAG?

Also, we’re using pfSense+, so would switching to unicast CARP help? Netgate’s CARP documentation warns:

However, use of unicast mode on traditional infrastructure where multicast is more suitable should be avoided. In unicast mode switches may flood packets for unicast CARP VIPs to all ports, leading to significant security and performance concerns.

Could someone please explain the practical security concerns caused by switching to unicast CARP?

And while we’re at it, there are some serious limitations with UniFi’s ECS-Aggregation switches that I don’t remember Tom’s video addressing:

You can currently only define 26 MC-LAG target groups. If you’re like us and looking to set up MC-LAG groups of one port on each switch, you’ll only be able to use 26 ports. UniFi claims this may be addressed in a future fw release, and I can’t blame them for focusing on stability. UniFi only recently added this limitation as an asterisk next to MC-LAG on their store page. You’ll notice it missing in Tom’s video when he shows the store page.
If you’ve set up MC-LAG, you’ll be unable to use normal port aggregation on any of the remaining ports. The option for normal aggregation is grayed out in port config. So you’ll now probably be left with 26 ports on each switch that can’t even be used for normal aggregation. This was a surprise and there is no mention of this limitation on the store page.

Thanks for any advice and sorry for the wall of text!

LTS_Tom · April 4, 2026, 10:23am

I have not tested these with pfsense but that limitation leads me to believe it will not work with an HA setup.

markadavis · April 4, 2026, 1:07pm

I don’t understand why it matters. If both PF and ECS are HA, then each PF is connected to each ECS with MC-LAG. Won’t the PF heartbeats be seen already by both switches through their respective MC-LAG port connections without having to traverse the ECS peer link when using multicast? Granted, this stuff is over my head…. am I missing something?

And if one did use unicast, instead, why would that cause a performance problem? Isn’t that just a tiny bit of occasional data? Even if those went to all ports, how is that significant (or worse than using multicast) for modern/fast switches?

brainjake · June 30, 2026, 8:55pm

Just wanted to chime in on this since we are staging this exact setup scenario for a network rebuild now. Netgate 8300’s in HA with Netgate’s 25G Add-in card (Intel E810 chipset) LACP’d to 2 ECS-Aggregation switches in MC-LAG. It is still on the bench now so we have not pushed traffic at scale to it, but so far with PfSense Plus 26.03.1 and ECS-AGG on 3.0.8 multicast CARP traffic between the 2 PfSense nodes is working on both the primary LAGG interface and multiple VLANs hanging off it as well. Here’s hoping it holds up under load.

xMAXIMUSx · June 30, 2026, 9:45pm

I hope you are running TNSR to get the throughput of 25G. The kernel is going to get in the way beyond 10G. VPP is going to be your friend.

tmacalp · July 1, 2026, 2:54pm

Great! I hope everything works for you! I’d make sure to test the scenario of a single module or DAC cable failing. That would be the scenario we’re worried about, where multicast traffic might have to cross the ecs-agg switches’ peer link. In that case, multicast traffic(vrrp/carp heartbeats) may not cross that peer link and may cause missed heartbeats and flapping.

We upgraded our core switches from Pro Aggregation switches, so we ended up dedicating one of those pro-agg switches to sit in front of both firewalls. I hate that it re-introduces a single point of failure, but at least both switches are connected to the same switch fabric. That pro-agg switch is mc-lagged back to the ecs-agg pair. Maybe in the future, we’ll have the courage to connect the firewalls directly via mc-lag. We’ve also exhausted our limit of 26 mc-lag target groups, so we’ll need to wait for a future firmware version before even attempting it.

bmill · July 24, 2026, 5:05pm

I am looking at a similar setup. I am curious how has this worked out for you?

markadavis · July 24, 2026, 9:16pm

I hope you are running TNSR to get the throughput of 25G. The kernel is going to get in the way beyond 10G. VPP is going to be your friend.

Your point is valid- why would the parent install 25Gb cards in a Netgate 8300? It isn’t going to keep up with that in PFSense. It would only make sense if using TNSR.

For most sites, going beyond 10Gb isn’t needed on a PFSense Netgate… Only internet traffic combined with cross-VLAN routing goes through the Netgate. In our case, ourd has dual 10Gb adapters (added, in addition to the other ports), and those are LAG to the switches. Only a small fraction of the total traffic has to go through the Netgate with typical setups, ours included. All the rest of the traffic that is 25Gb is going between switches directly, or to/from local devices not crossing VLAN’s.

LTS_Tom · July 25, 2026, 11:52am

What you run into with pfsense is not that it can’t do over 10G or even 25G but that it is a per stream per thread CPU bound issue. For example: you can have a single stream that may top out at 2G per CPU core but 5 streams at 2G each that get the full 10G speed.

markadavis · July 25, 2026, 1:11pm

Yeah, that is a consideration as well. It is a shame it can’t go faster, but it is some limitation in BSD. It looks like most sites say it tops out at 1Gbps per stream. Linux can do about 10Gbps. I wonder if that will ever improve?

Since things like web browsers typically open multiple streams, and in a multi-user system, there can be hundreds or thousands of streams, the limitation isn’t as bad as it sounds. NFS can be told to mount with 16 streams under Linux (although I have never tried it before). There is information about rsync where you can use other programs like GNU parallel to split its jobs between several streams (again, I have never tried that).

markadavis · July 25, 2026, 1:40pm

I work with tmacalp and we ended up doing what he described in a later post. Rather than connecting both Netgates directly to the dual Unifi campus switches, we connected them both to a single Agg Pro switch via simple LAGs. And that Agg Pro is connected to the rest of the network using MC-LAG to the two Campus switches.

This still means we have a single point of failure for the Netgates to the LAN… the Agg Pro. But at least the Agg Pro can maintain connectivity if either of its connections fail to the Campus switches, or if one of the Campus switches goes down (fails or gets firmware update, etc). And we retain redundancy for our VLAN routing, DHCP, DNS, and firewall/Internet access. And testing shows this works OK.

It is not ideal, of course. It would be better if we could safely connect each Netgate through MCLAG to each Campus, for complete redundancy. But that current strange limitation of the Campus not allowing multicast CARP/VRRP through their peer link made that not possible. Although we didn’t test that it is true (because this is a live 24/7 network and we don’t have the resources to test).

So… the down side is the single point of failure of the Agg Pro. However:

We have years of experience with the Agg Pro’s being core switches. They seem to be very reliable and none have ever rebooted or degraded their performance requiring a reboot.
We haven’t had any fail (as in hardware total failure).
We are using multi-redundant power: we have them connected to primary power through one UPS, connected to one circuit, connected to generator. And the redundant power cable RPS to a different UPS, on a different circuit, connected to generator. (Similarly with the Campus ones).
We are using regular LAGs between the Netgates and Agg Pro. This will protect against DAC failure, interface failure, or disconnects.
As mentioned above, we have the Agg Pro connected with MCLAG to each Campus core switch, protecting against single port/DAC/interface/Campus failure.
We have a spare, cold, pre-configured Agg Pro mounted on the rack, directly above the active one. If there is a failure, we should be able to at least get things back up quickly.

So, except for total hardware failure of the active Agg Pro, the only time it becomes problematic is if/when we need to update the firmware on the Agg Pro. And since it is a stable/older model, that is rare now. If we need to do that, we will need to schedule total network downtime for that few minutes. So far, we have run like this for a few months.

Who knows- maybe Ubiquiti will further update the ECS Campus Aggregation to allow more MC-LAG than just 26, and solve the stated/apparent problem with with CARP/VRRP in the future. At that time we might re-evaluate our current setup.

LTS_Tom · July 26, 2026, 10:36am

On most modern CPU’s it’ closer to 5G. Generally faster clock speeds allow for faster transfer.

markadavis · July 26, 2026, 1:10pm

On most modern CPU’s it’ closer to 5G. Generally faster clock speeds allow for faster transfer.

Oh, that is much better.

A Netgate 1537 has an 8-core 1.7Ghz Xeon D-1537 (Turbo of 2.30 Ghz) circa 2015. So not ancient, but far from modern. Netgate says:

IPERF3 Traffic: 18.80 Gbps (TCP - 1460 byte payload and TCP framing) [no mention as to how many streams]
IMIX Traffic: 15.22 Gbps (Simple IMIX traffic is sets of 7 (40) byte packets, (4) 576 byte packets, 1 (1500) byte packets, plus ethernet framing overhead.)

They tend to focus on multistream performance metrics, which I can’t blame them for doing, since that is closer to real-use relevance (plus it looks better). Strangely, I don’t think I have ever tested single-stream forwarding performance on it (I checked my notes).