Issue with pfSense on XCP-ng using tagged networks (xcp-ng) cannot resolve hosts on peer xcp-ng hosts and latency on external hosts

eric.techdev · September 3, 2022, 8:23pm

Issue with pfSense on XCP-ng using tagged networks (xcp-ng) cannot resolve hosts on peer xcp-ng hosts and latency on external hosts.

I have configured vLAN tagged networks that connect to a trunk that is bonded to two NICs. The ISPs are on their own interface.

I have 6 of these setup:
WAN network → pfSense on XCP-ng → XCP-ng network vLAN tagged and connected to redundant trunk

x.x.x.10 (WAN) → x.x.10.1 (fpSense) → vLAN 10

The trunk has the management network as its native network untagged.

If I connect a VM to this same tagged network on the XCP-ng host, it will see the router and I can login to the UI but not be able to get out to the internet. If I connect via a VM or physical host outside of XCP-ng, I can connect but the connection is very slow (~20 Mbs). If I connect the directly to the WAN on the XCP-ng hosts, there is no latency.

I have had a similar setup without the vLANs on esxi 7 that I transitioned to XCP-ng and vLANs, but it did not face any latency issues running on the vmware stack (it also did not use vLAN tags).

I suspect that I have done something wrong with vLAN tagging or best practices, since this is new tech in my stack that I am learning. XCP-ng is working for the other hosts and setups, the only real thing on this pfSense (one-to-many+to-one+balanced gateway) approach seems to be some issue with the network stack and configs I have implemented in some way. I could be wrong though.

Intended application: I need to setup router(s) for each ISP IP block address to provide unique gateways to have an additional router balance outbound traffic across them.

I am sure it is just something simple I am doing wrong, but I am at a loss for what I did that causes this behavior.

eric.techdev · September 3, 2022, 10:16pm

@LTS_Tom Any chance you want to toss together a video on “best approach to assign redundant (2 - ATT + Comcast) ISPs IP 8 blocks to vLANs and a vLAN/router that balances them all using pfSense and XCP-ng”, potentially “with supernetting”. Your videos are super helpful and that is what led me here in the first place.

If I had not spent my entire budget on hardware for the project, I would look at contracting you to consult on the work, but right now I am in a pinch and have to do what I can to sort it out to get the system live for our users and I need to learn this so that I am able to maintain the beast and I am in the 11th hour.

I am thinking this can all be done with 2 pfSense routers in a complex configuration, but I am approaching it with individual virtual pfSense VMs that host unique gateway addresses for the next router to use to balance the traffic across the addresses.

I want to be able to assign some hosts to the individual IP vLANs and some to the balanced one(s) so that I can deploy a robust set of solutions on the setup.

I am using:
-ATT Gb fiber w/ 8 block IP
-Comcast business 50 Mb business w/ 8 block IP
-2x GS308E - 8-Port Gigabit Ethernet Smart Managed Plus Switch
-Dell R620 - 20 core - 88 GB DDR3 RAM - ESXi 7 - 8x 1Gbs NICs - 2.5 TB (4x 900 GB) RAID 6 - dual SD (OS)
-Dell R720 - 24 core - 128 GB DDR3 RAM - XCP-ng 8.2 - 6x 1Gbs NICs - 15TB RAID 6 - 500GB NVMe - 180 GB SSD SATA (OS)

Pending config/rollout:
-Dell R830 - 64 core - 1TB RAM - 2x Intel 10Gb SFP+ - 4x 1Gbs NICs - 25TB RAID 6 (26x 1.2 TB) - 180 GB SSD SATA (OS)
-Dell R720 - 24 core - 64 GB RAM - 4x 1Gbs NICs - 15TB RAID 6 (16x 1.2 TB) - 120 GB SSD
-4x 1Gbs port expansion card
-a handful of mixed SSD SATA drives

Due to the way that XCP-ng works (name resolution) I have to run the management network on the ESXi for now. If I have it in the XCP-ng server with my AD DCs, it cannot start anything since it cannot resolve it’s own host name for some reason. I would like to have all the hosts on XCP-ng and plan to run faasd with minio (as a lambda replacement) and potentially trueNAS to use the storage more widely on that host.

The reason behind this configuration it so extend the IP constrained external API limits on several endpoints that data is being collected on. We have already optimized the code to reduce the total calls and cache relevant data, but need to expand our use of the APIs that provide data and user interactions at various endpoints we do not control. Some of them we could get whitelisted for higher use, but others are not worth the bother to go through to do that, when we are not abusing them, but using as intended at scale with code that keeps in mind that they are a finite resource.

I want to setup some of the failover and redundancy features across the 3x XCP-ng hosts in the future as well, but that is not really the issue that I am facing at the moment.

LTS_Tom · September 5, 2022, 11:18am

I don’t have a recent video on the topic, but here is the latest pfsense documentation on load balancing:
https://docs.netgate.com/pfsense/en/latest/multiwan/load-balance-and-failover.html

eric.techdev · September 5, 2022, 6:07pm

Thanks for that link, but I am approaching that portion of it (load balancing) with an approach that works. I have to make sure that each of the IPs have a gateway to ensure that they can be balanced.

The real issue I am facing is the bit about hosting these networks on vLAN segments that are on the XCP-ng server. I think that is the portion that is not working as intended.

When I setup the pfSense routers on XCP-ng on vLANs they are not working. When I setup the balanced solution without vLANs on esxi, they are working as intended.

Do you have any ideas as to what I may be doing wrong in this portion of the setup?

I want to use the vLAN/trunk so that I am able to expand this setup to a 128 block of IPs in the near future.

eric.techdev · September 5, 2022, 8:07pm

I think it may revolve around the “Disable TX Checksum Offload” setting. I will test this in a bit.

eric.techdev · September 5, 2022, 10:59pm

@LTS_Tom - It seems the above setting I mentioned, “Disable TX Checksum Offload”, is what I missed. It is all working as intended… so far.

Found the solution here: How to install pfSense in a VM

I am not sure how I missed this detail in my initial scoping, but w/e it is now a known.

Thanks for taking a look with me on this adventure and I hope this helps a lost soul in the future.