Questions and concerns with 2ary routers, pfSense, etc

So I’m refining my network to solve some problems. The initial network looks kinda like the Old Network highlighted in Blue. Some details have been changed or omitted for relevance and simplicity.

Notice how one pfSense is handling all the subnets. This is great for managing DHCP reservations, inter-vlan policy, etc. but it introduces some limitations that I particularly want to solve.

I have DNS reserved to a TLD that I use internally. I generally want a given hostname in a given network to get assigned an FQDN per the table below:

Hostname VLAN FQDN
host1 LAN host1.my.tld
host2 LAN host2.my.tld
server1 SRV server1.srv.my.tld
server1 DEV server1.dev.my.tld

This generally doesn’t work with pfSense. Every DNS reservation is placed in just hostname.my.tld, regardless of what settings I apply on the DHCP server or DNS resolver pages. After some inquiry in various pfSense and Networking forums, I learned that the only solution is to have a separate DHCP and DNS server for the separate DEV and SRV subnets. To that end, I decided to make them separate routed subnets with their own private gateway routers. The new updated layout, migration in progress, will look something like the New network highlighted in green above. Again, some details have been changed or omitted for relevance and simplicity.

There is a /31 tunnel between the pfSense main router, and the routers for SRV and DEV subnets. The DEV and SRV subnets are all virtual, and the routers are VM appliances right now. I have DEV converted, but I haven’t touched SRV yet, not for reasons, but really, just because. I’m still trying to solve some hiccoughs with the DEV network. I’m running BGP over that link to enable auto-discovery of the paths to various networks from the main LAN. The main purpose of using BGP instead of a static route is to enable a future Kubernetes server running KubeVIP to advertise its endpoints over BGP, which is one of the marketed ideal ways for it to work in HA mode, and then have direct access to that K8s cluster from the LAN subnet, even as it grows and evolves.

This absolutely solves all my DHCP and DNS issues, however, it introduces some others.

First: Managing firewall and routing rules for the subnets, including NAT translation, and access to and from the various devices now requires touching TWO routers and firewalls in some cases, and in the case of DEV<->SRV traffic (Used for allow-listing a development machine to access the GIT server in the SRV network), it requires touching 3! (Yes, I know I could just globally allow access to git.srv.my.tld from the DEV subnet, but principle of least access, granular controls, etc.). This includes needing to manually add the subnets for DEV and SRV as sources for DNAT to NAT traffic from, otherwise, they can’t access the internet. What are some tips to minimize the number of things I need to touch while simultaneously maintaining a single source of Security Truth, or to better manage coordinating the security updates between devices and subnets?

Second: What router to even run for the virtual subnets? It’s not the highest throughput network, but It does occasionally require streaming large data transfers, such as checking in/out large projects into GIT, or pushing large projects to the internal docker/kubernetes nodes for testing. I can easily saturate 10GbE when connecting direct. I’ve got 56GbE ethernet links, and can almost saturate those when connecting direct, but pfSense, even severely overprovisioned, can only do about 15GbE routed. It would be absolutely stupid to add 2 additional virtual pfSensei in the network connection every time I’m copying data back and forth. So I tried out VyOS, which seems to be lighter weight and not limited by pfSense’ kernel locking implementation limitations which allows it to route faster with less hardware, similar to the now unobtanium TNSR. TNSR would be my first pick, since this is exactly the kind of job TNSR is designed for, but again, unobtainum. So the third thing I looked at was terminating these VLANs on my switch instead of with a VM router, but my switch, an MSX6036, can only do DHCP relay. DHCP Relay is fine if I’m not doing this primarily to get proper name resolution and isolation, but it introduces another bonus issue. So are there any good router recommendations that run VPP/DPDK like TNSR does? I tested that ‘feature’ on VyOS, and it doesn’t support a reboot yet. Nice otherwise. Kinda half baked. Or recommendations other than pfSense and VyOS that are incredibly lightweight, and basically focused on JUST packet forwarding and routing, without a bunch of extra features I don’t need? Would it be better to run a separate dedicated DHCP/DNS server in each subnet, and do the VLAN termination on the Switch? If so, what’s a good choice that’s easier to manage?

Third: Should I put all the Inter-VLAN routing links on a single shared VLAN, or on their own dedicated VLANs? Obviously, if they’re all on the same VLAN, then DEV and SRV can talk directly to each other without needing to reach out to the main pfSense box, however, the main pfSense box is the main arbiter of connections. However however, the DEV and SRV routers are VMs. (Usually) On the same VM host. The connection speed between them is limited by the speed of the CPU, not the switch, and not the router. But then I don’t have a single arbiter of security. So how does one balance these decisions? What would YOU do and why?

Fourth (Bonus) issue: How do you set up pfSense to be the DHCP server for a subnet behind a DHCP relay? That could actually work for my IOT VLAN and some of my other Mystery MouskaVLANs, since nothing on them is registered in DNS in any meaningful way, but they do get some static reservations. However, afaict, pfSense only lets you set a DHCP Scope for a subnet an interface has an actual IP address contained within. So for example, if I had a VLAN with the subnet 192.168.60.0/24 and it was routed via pfSense out an interface with the ip address 192.168.69.250/31, and I wanted the router on the other side to relay DHCP requests back to pfSense, pfSense could only hand out 192.168.69.251/24, which is the peer on the routing link, so largely useless. Is there a way to set this up in pfSense? pfSense does include a DHCP Relay CLIENT, so I would expect it can handle being the DHCP Relay SERVER.

I know this is largely a wall of smaller walls of text, but these are the kind of issues I expect a lot of people have tackled already in the past, and have easy solutions to, and I just don’t know what they are yet, and I hope I’ve explained everything clearly enough to get quick and helpful answers without confusing y’all more. Thanks for taking the time to read some of this!