Hello everyone,
I’ve recently encountered a challenging issue with my network setup involving VLAN’s, and I’m hoping to get some insights or solutions from this knowledgeable community. Here’s a brief overview of my current setup and the specific problem I’m facing.
Host System:
XCP-NG 8.3-beta1 running on a cwwk mini-PC with an Intel i3-n305 CPU and 8 2.5G NICs (i-226V). 6 LAN Firewall Appliance 2.5G Router 12th Gen Intel i3-N305/N100 DDR5 – cwwk
The reason for running 8.3-beta1 is support for the NICs and graphics on this pc. 8.2 does not install at all.
pfSense VM (version 2.7.1) with 4 VIFs configured as follows:
- vif0: External (eth0)
- vif1: Internal (eth1, 192.168.88.1/24)
- vif2: Guests (eth1, 192.168.55.1/24, VLAN 55)
- vif3: SFW (eth1, 172.16.0.1/24, VLAN 20)
Network Setup:
VLANs are defined in XCP-NG with Xen Orchestra for the pool (one XCP-NG host). tx offloading is disabled for all VIFs connected to pfSense and other FreeBSD vm’s.
I have an 8 port Unifi switch (US-8-60W) connected to the host system, and a couple of Unifi AP’s connected to the switch. These are controlled via unifi controller running on a Debian VM. In unifi controller I have configured three networks and three Wifi’s (default (no VLAN), SFW (VLAN 20) and Guest (VLAN 55))
The Issue:
While general browsing from my laptop works fine over all wifis (and wired), I’m experiencing packet loss and SSH connection drops when communicating between VLANs. When browsing xen orchestra from default network to xen orchestras SFW VLAN 20 IP-address I get logged out after a short while. Works fine if I’m within the same VLAN or just the default network ip-range. This issue didn’t exist in my previous setup on a Supermicro server where the same VM’s were migrated from (XCP-NG 8.2).
Specific Scenario:
When connected to the SFW VLAN (172.16.0.0/24) over WIFI and I SSH into a Debian server on the default network (192.168.88.0/24), the connection initially works but hangs after about 15-30 seconds, then drops: “client_loop: send disconnect: Broken pipe”
Observations and troubleshooting done so far:
- SSH connection to a server within the same VLAN is stable.
- Issue persists across different laptops and both WiFi and direct connections to switch.
- Created a bridge in pfSense to be able to connect directly to SFW VLAN via cable. This was stable.
- Bounded a port on the Unifi switch to connect directly to SFW VLAN via cable. This was NOT stable.
- Replacing the switch with a Netgear GS108 did not resolve anything.
- Fresh installation and configuration of pfSense did not resolve anything.
- Disabled specific rules in pfSense to allow unrestricted inter-VLAN communication.
- Wireshark captures show “Spurious Retransmissions” and “Duplicate ACKs” right before the disconnect.
Wireshark log just before and when the SSH connection hangs:
No. Time Source Destination Protocol Length Info
468 17.397715 192.168.88.5 172.16.0.104 SSH 166 Server: Encrypted packet (len=100)
469 17.397718 192.168.88.5 172.16.0.104 SSH 102 Server: Encrypted packet (len=36)
470 17.397959 172.16.0.104 192.168.88.5 TCP 66 64059 → 22 [ACK] Seq=1 Ack=2449 Win=2048 Len=0 TSval=2342211817 TSecr=1567302730
550 18.421836 192.168.88.5 172.16.0.104 SSH 166 Server: Encrypted packet (len=100)
551 18.421839 192.168.88.5 172.16.0.104 SSH 102 Server: Encrypted packet (len=36)
552 18.422079 172.16.0.104 192.168.88.5 TCP 66 64059 → 22 [ACK] Seq=1 Ack=2585 Win=2048 Len=0 TSval=2342212841 TSecr=1567303754
560 18.465260 192.168.88.5 172.16.0.104 TCP 102 [TCP Spurious Retransmission] 22 → 64059 [PSH, ACK] Seq=2549 Ack=1 Win=501 Len=36 TSval=1567303797 TSecr=2342211817
561 18.465433 172.16.0.104 192.168.88.5 TCP 78 [TCP Dup ACK 552#1] 64059 → 22 [ACK] Seq=1 Ack=2585 Win=2048 Len=0 TSval=2342212884 TSecr=1567303797 SLE=2549 SRE=2585
564 18.685180 192.168.88.5 172.16.0.104 TCP 202 [TCP Spurious Retransmission] 22 → 64059 [PSH, ACK] Seq=2449 Ack=1 Win=501 Len=136 TSval=1567304017 TSecr=2342211817
565 18.685384 172.16.0.104 192.168.88.5 TCP 78 [TCP Dup ACK 552#2] 64059 → 22 [ACK] Seq=1 Ack=2585 Win=2048 Len=0 TSval=2342213104 TSecr=1567304017 SLE=2449 SRE=2585
574 19.125225 192.168.88.5 172.16.0.104 TCP 202 [TCP Spurious Retransmission] 22 → 64059 [PSH, ACK] Seq=2449 Ack=1 Win=501 Len=136 TSval=1567304457 TSecr=2342211817
575 19.125462 172.16.0.104 192.168.88.5 TCP 78 [TCP Dup ACK 552#3] 64059 → 22 [ACK] Seq=1 Ack=2585 Win=2048 Len=0 TSval=2342213544 TSecr=1567304457 SLE=2449 SRE=2585
585 20.083755 192.168.88.5 172.16.0.104 TCP 202 [TCP Spurious Retransmission] 22 → 64059 [PSH, ACK] Seq=2449 Ack=1 Win=501 Len=136 TSval=1567305353 TSecr=2342211817
586 20.083957 172.16.0.104 192.168.88.5 TCP 78 [TCP Dup ACK 552#4] 64059 → 22 [ACK] Seq=1 Ack=2585 Win=2048 Len=0 TSval=2342214503 TSecr=1567305353 SLE=2449 SRE=2585
615 21.728818 172.16.0.104 192.168.88.5 WebSocket 80 WebSocket Text [FIN] [MASKED]
616 21.733162 192.168.88.5 172.16.0.104 WebSocket 76 WebSocket Text [FIN]
617 21.733325 172.16.0.104 192.168.88.5 TCP 66 64164 → 80 [ACK] Seq=1019 Ack=419 Win=131328 Len=0 TSval=2478428867 TSecr=1567307065
618 21.781205 192.168.88.5 172.16.0.104 TCP 202 [TCP Spurious Retransmission] 22 → 64059 [PSH, ACK] Seq=2449 Ack=1 Win=501 Len=136 TSval=1567307113 TSecr=2342211817
619 21.781357 172.16.0.104 192.168.88.5 TCP 78 [TCP Dup ACK 552#5] 64059 → 22 [ACK] Seq=1 Ack=2585 Win=2048 Len=0 TSval=2342216200 TSecr=1567307113 SLE=2449 SRE=2585
I’ve spent considerable time troubleshooting this without success. Any insights, suggestions, or similar experiences would be greatly appreciated.
Thanks in advance!