Self-hosted Overlay Network Router for Globally-routable IP Addresses?

Lucky · January 12, 2022, 4:20am

I am in the market for a self-hosted overlay network router product to transport globally-routable IP addresses, both IPv4 and IPv6. I do not know if such a product even exists. The product would have to operate in the environment described below. I am not sensitive to the form factor of the product. While I’d prefer VMs, I’ll take 19" rackmounts.

Colo Anchor Point and Future Gateway
At the colo, I have 2x legacy IPv4 /24 and 1x IPv6 /48 of globally-routable IP addresses.

Remote Office
At the remote office, I have a range of uplinks, one or all of which can be down at any given time. The uplinks range from DSL, to Starlink, to LTE. From a design perspective, some uplinks will support IPv4 only, some IPv6 only, and some both (poorly). All uplinks can be assumed to be behind a NAT, including the IPv6 uplinks. External IP addresses should be assumed to be changing frequently, with outgoing port numbers subject to change at any time and without notice.

Desired End State
A self-hosted overlay network that makes available at the remote office subnets of the /24s and /48. Isolating the remote office users from the ever-changing seething state of the physical links below it. Other than complete simultaneous failure of all uplinks of course.

Not Required
Any kind of firewalling, port filtering, or NAT. It would be nice if those were were included, but such capability can always be added by adding another router. This post here concerns itself solely with achieving physical transport of globally-routable IPv4 and IPv6 packets, however encapsulated, across changing and at times actively hostile underlying links.

Commercial suggestions welcome. If there were an open source solution, I suspect I would have found it by now.

Thanks,
– Lucky

brwainer · January 12, 2022, 5:10am

If you’re looking for a mostly-point-and-click commercial solution, that would be “SDWAN without remote office DIA”. Meaning you get an SDWAN product (this takes care of building tunnels across every uplink combination between each endpoint) and you configure it to use the other end of the tunnels (the datacenter) as the default gateway for user traffic. In the SDWAN industry, “DIA” (Direct Internet Access) means allowing some traffic to leave the remote office via its local internet connections, instead of sending everything to the hub (datacenter / main office). Then at the datacenter you do whatever routing and NATing you want - usually this same SDWAN product will do this for you too.

The only small-scale option I can point to with some familiarity is Untangle SDWAN, but there are plenty of options. Just be sure to look a little bit beyond the marketing term of “SDWAN”. ZeroTier, for example, will help you build tunnels between locations/devices and provides what appears to be a direct ethernet switched connection between all of them, but it doesn’t have any of the path selection and uplink monitoring features that most “SDWAN” includes. Watchguard’s SDWAN is the opposite - it has a lot of intelligence for selecting which uplink to use for each type of traffic, but doesn’t do any site-to-site tunneling. SDWAN just means “Software Defined WAN” so there’s a lot of possible features that it might or might not include.

Since you’re just talking about two endpoints, (colo and remote office) implementing an SDWAN solution may be overkill. You can “easily” (in terms of complexity) set up the same yourself with PFSense, Mikrotik RouterOS, etc at each end. You manually set up tunnels/VPNs between the two (with the colo as the server since it has fixed IPs that aren’t NATed), and then configure your remote office to use a certain IP at the colo as its default gateway IP. Some protocols for tunnels are better at NAT traversal than others but it is completely doable.

Lucky · January 12, 2022, 7:18pm

Bruce,
Thank you so much for taking the time to write such a detailed answer!

Some follow-up questions are below:

If you’re looking for a mostly-point-and-click commercial solution, that would be “SDWAN without remote office DIA”.

I would tremendously appreciate names of manufacturers and specific products. Must be self-hosted, no cloud/MITM involved.

The only small-scale option I can point to with some familiarity is Untangle SDWAN

Having just looked at their datasheet, it appears that one would deploy a “Micro Edge” device per “remote office” an “NG Firewall” at HQ/the colo, and then there is a “Command Center”. Are those three different devices? What’s the rough aggregate cost of those devices, all-in?

Drilling down on the “Micro Edge” device’s wired WAN uplinks, the top tier features a “2x GbE / SFP Combo”. It is not entirely clear to me if that means that this enables the use of three separate WAN uplinks or if this is marketing speak for “two WAN uplinks and you get to pick the connectors”.

If there are only two uplinks, the device would not suffice for my needs. Three WANs would help dig me out of the hole that I am in today. But if I invest in hardware, I’d like at least four WANs, five would be better.

I totally understand your point regarding ZeroTier. It is a fantastic tool for what I use it for, which is to provide occasional networking support to non-techie friends: I drop off an old laptop with ZeroTier and a Rumble Agent on it and before long can fix whatever plagues them from the comfort of my couch.

WatchGuard I was unfamiliar with. You are correct that absent tunneling that’s not what I am looking for.

I looked long and repeatedly at OpenMPTCProuter and have nothing but respect for Ycarus. Still, OpenMPTCProuter is duct-taping together three completely different technologies to achieve its goals and in the end I’d still be left with an OpenWrt-based router. For me, that’s an unlikely path to happiness.

So what’s the next step up in number of WAN connections from Untangle?

(I do remain curious how much even an Untangle solution would cost for one remote site all-in and if that entails two or three WAN uplinks).

Big Thanks!
– Lucky

FredFerrell · January 12, 2022, 10:54pm

If you are going to establish tunnels from the office to the colo across each ISP, I would probably look to setup IPsec tunnels across the three ISPs and use BGP to determine availability. This can be done on cheap Cisco routers if needed. I would put the router in front of your firewalls and just let them handle the transport between your two sites.

What firewall/routers do you have in place now?

brwainer · January 13, 2022, 12:21am

Untangle SDWAN and Untangle NGFW are unrelated products. Yes, they do often show them being used together - SDWAN bringing traffic into a datacenter or main office from remote offices, and then all the traffic to/from the unfiltered internet going through a separate NGFW. But their SDWAN product has sufficient routing and basic firewall chops that it probably suffices for you anyway.

Command Center is their cloud component that is included with the cost of any subscription (actually some features are available even for free with an unlicensed NGFW install). Aside from providing cloud-based remote access, the main purpose it has is to push the same policies out to a bunch of remote offices at once. Since you’re only talking about one remote office, I don’t think you’ll “need” to use it, but it will be a nice to have. Connecting to the cloud is not required for Untangle products to work, other than checking their license status. Command Center is not part of any network decisions, if you choose to use it to push down policies then that’s only helping you apply the same config to multiple devices simultaneously.

Untangle Micro Edge (which I didn’t realize they renamed their SDWAN product to until just now), just like Untangle NGFW, can actually be installed on any hardware of your choosing, you don’t have to pick one of their preconfigured appliances. Additionally, you aren’t limited to the physical ports on the device, you can connect more WANs to a switch, put each WAN in its own VLAN, and then use the VLANs as WAN interfaces.

It looks like they’ve dumbed down the “install it yourself” offering by not making a generic installation ISO available, and instead are offering a “Virtualization” option with premade VMWare images. So what you can do is purchase a generic server, install ESXi on it which has a free license option (totally sufficient for your needs) and then either pass through a bunch of NICs (preferably Intel) to the VM, or do what I suggested before with VLANs and make a bunch of virtual interfaces, one per VLAN.

I think I made one assumption that’s untrue now that I’m looking closer at it. I thought that Untangle Micro Edge did help automatically build tunnels, and provide a one-click “use X location as the gateway for all user traffic” option. You can definitely use Untangle Micro Edge to create tunnels and then make a routing policy to use those tunnels collectively as a gateway, but it will be more manual that I realized.

Assuming you do two virtualized MicroEdge installations (one at each end), and you don’t also have an Untangle NGFW install at the datacenter, you’re looking at $162x2=$324/year for 100Mb, or $238x2=$476/year for unlimited bandwidth. The license only cares about bandwidth, not number of WANs/tunnels/etc. All the pricing is publicly visible at Configurator | Edge Threat Management – Arista

The actual product names I know would be way, way overkill for you… Right now I’m working at a Fortune 500 and we’re using Versa Networks SDWAN. That product is competitive against offerings from Cisco and Fortinet. Meraki SDWAN would be a viable option, but that’s probably way too much cloud for you even though no traffic goes to the cloud.

Yeah I’ve looked at that project before for personal and nonprofit projects, but never got up the nerve to set it up. I’ve mostly gotten away with ZeroTier and/or EoIP tunnels, and some sort of routing on top of them. Its very much manual effort, but it is something that’s comfortable for me to set up and maintain.

brwainer · January 13, 2022, 12:25am

This is really what all the big-name SDWAN products are doing in the background anyway - Versa, Cisco, Fortinet, Meraki. Its just IPSec tunnels and BGP. You can also do this with PFSense, Mikrotik, Ubiquiti EdgeRouter, etc.

FredFerrell · January 13, 2022, 2:22am

One interesting protocol with Cisco to check out is PfR. It was probably one of the first to offer application performance based routing long before SDWAN was a thing. I’m pretty sure Viptela uses it. My one knock on SDWAN solutions is they generally cost more than the cost of higher bandwidth circuits. If you have enough bandwidth then why the need for SDWAN?

Lucky · February 21, 2022, 3:31pm

@FredFerrell @brwainer

First and foremost, thank you both tremendously for your insights and taking the time out of your busy days to fill in this knowledge gap of mine. My apologies for the slow reply. $dayJob got in the way.

It is unclear to me how to reply to more than one individual in the To: line of a post. Since your posts are so tightly linked, I am going to @mention both of you in the the hope that you’ll both see this.

My project that we have been discussing this weekend went from “desireable future” to “decision needed immediately”.

My pfSense router at home, where the link aggregation is desired, suffered an irreparable hardware failure, requiring hardware replacement over the next few weeks. I am currently running pfSense on the only other 3+ NIC system that I have access to at present: a Dell PowerEdge 730xd with 256GB of ECC RAM. Not a tenable situation, since I need that machine for other purposes. I can live with it for a week or three.

What firewall/routers do you have in place now?

All my firewalls/routers are pfSense and have been since the project forked from M0n0wall. My choice of pfSense is more a function of my age and familiarity with the product than anything else. I have passing experience with deploying a few USGs at the houses of friends where cost was paramount. I am however open to replacing pfSense with another solution, learning curve pain notwithstanding, since I need reliability and speed more than anything.

My House
At my house I was for historical reasons using one of those 4-NIC Pico PCs:
Intel(R) Atom™ CPU E3845 @ 1.91GHz, 4 CPUs: 1 package(s) x 4 core(s).
This is the router that just died that I am stopgapping with the R730xd.

Even if I do not change my current pfSense setup, I would want to buy the necessary server hardware that will in the future be able to handle the full load of, let’s call it an aggregate of 3Gbps, of external IPSec traffic to two sites: the bulk of the traffic will be routed via a physically nearby ISP coop, a smaller percentage to a somewhat farther away second ISP coop, which also will serve as the failover for the first ISP coop.

3Gbps sounds like a good target number with some margin: adding up all the links (assuming they all work), I am looking at downlink speeds of 2x 250Mbps (2x VDSL), 1x 270Mbps (Starlink on a really good day), and a whopping, highly variable and IPv6-only link with a peak 700Mpbs down via a 5G modem that recently replaced my old glacial LTE failover modem.

Call it an aggregate IPSec-encrypted downlink of around 2.5Gbps. So 3Gbps to be on the safe side.

That’s before any routing between my local VLANs. Virtually all of which not related to IoT devices are using 10Gbps links. The external IPSec requirements alone would place my home pfSense router into at least NETGATE 1537 territory. For that price, I believe that I an pick up a used 1U Dell, stick in a couple of SFP+ NICs, put a switch in front, install quiet fans, and get a real DRAC (which I happen to be a big fan of) included. Dimensioning suggestions for which PowerEdge to buy appreciated. I’d prefer something not more than 2 generations back. In the end I will need three physical routers, which I prefer to be identical hardware: one per site.

@brwainer I am not (yet) familiar with how one would use BGP to split apart one and the same TCP connection, spread it across links, and stitch it together at the other end. A pointer to introductory material on that aspect of BGP would be appreciated. I can consult with the networking experts at one of my coops for further advice, but I would like to ask this set of questions with at least some baseline knowledge in my brain. Do you have suggestions for some web pages or docs that I should read that cover this connection splitting/stitching back together aspect of BGP? Thanks!

Any and all advice you or others have to share at this point would be much appreciated, given that “need to purchase new router hardware now” has just become the #1 issue in my reality.

Thanks again,
–Lucky

brwainer · February 21, 2022, 5:15pm

I don’t personally have the expertise to guide you on setting up BGP based load balancing. The basic concept is that you receive multiple routes to the same destination, and then decide to split things up by session, or proportion of packets, etc.

The only software I can recommend beyond PFSense is Untangle Micro Edge. Micro Edge is going to help you more with load balancing options. I would spec a computer about 25% more powerful if going with Micro Edge vs PFSense - just a gut feeling about relative performance.

FredFerrell · February 22, 2022, 12:10am

You can do load sharing across multiple BGP paths simultaneously using a maximum path configuration. This would be fine if the paths go across similar circuits such as dual 10G DCIs or 1G DIAs.

If your circuits aren’t similar I wouldn’t recommend this. I would look into something that looks at dynamic metrics because business class and cellular internet connections can vary greatly in performance at any given time.

SDWAN is a packaged solution that you can buy, but you’ll pay accordingly. You could look to build something identical, but you would have the overhead of setting it up.

If I was building it for myself, I would buy 3 used Cisco routers with SEC/App-X licensing and configure eBGP with PfR. Here is an example of a router with the correct licensing: CISCO ISR4321-AX/K9 Gigabit Router ISR 4321 AX APPX SEC license *** NOT AFFECTED | eBay

My recommendation would be to setup a virtual lab such as GNS3 or EVE-ng and build it there first . This isn’t the easiest thing in the world to setup, but there is plenty of documentation available and your basically trading your time for lower cost.

If you were fine with using only one circuit at a time and prioritizing your paths this could likely be done on pfSense and it’s probably the first thing I would try since it keeps things simple. All you’ll need to configure on your home router will be AS prepending and local preference. This will direct your flows to prioritize a path both outbound and inbound. I use it all the time when connecting to public clouds across IPSec tunnels or leased circuits.

As for servers, I’m a fan of the Dell R6xx series. They seem to be plentiful so it keeps the cost down and easy to source parts.

Lucky · February 22, 2022, 2:58am

Fred,

Many thanks again for the time you took with your response.

Since the circuits that I was able to assemble are highly dissimilar, “a packaged solution that you can buy, but you’ll pay accordingly” is sounding increasingly appealing. I am, within the bounds of reason, willing to trade time and convenience for money.

I have one, super clean, alternative to stitching together dissimilar circuits into one TCP connection: in the sidewalk, directly outside my house, as holds true for all houses on my street, is a sub-sidewalk junction box. In that junction box is a fat bundle of fiber. From the sidewalk junction box to the logical location of a demark in my basement requires trenching 15’ through a carport consisting of sand. A job that I could do myself with hand tools on a Saturday. Splice included.

Since I am stuck in Berlin, I am in DTAG territory. The regulatory capture of the German government by DTAG is watertight. DTAG will be happy to run the fiber drop to my house if I sign the following contract:

the cost for the drop shall not be lower than EUR 10,000. Which I might, reluctantly, consider.
DTAG unilaterally and without appeal determines the amount charged to me post-installation. I, in advance, contractually agree to pay that amount, be the DTAG determination EUR 10k, 100k, or 1M. This I cannot agree to.

Hence, any commercial, quick to deploy, 3-node SDWAN solution under a one-time cost of $10k is not that shabby of a deal from my perspective.

Unfortunately, this router appears to have 2x 1Gbps NICs. Presumably one for the uplink and for the inside. I need an aggregate of 3 Gbps. At the ISP coop sites, the physical uplink will be 10Gbps fiber. An SFP+ NIC will required in the two routers at the two ISP coop sides.

Being able to get one TCP connection with a throughput greater than my fastest uplink is the primary objective of this exercise. It is unclear to me how to achieve that if I can use only one circuit at a time. What am I missing?

I am a big fan of Dell servers myself, especially those with DRAC 8 or higher. If there is a homebrew solution, I would want to go that route. My question was more along the lines of “which specs R6xx” would be required by a homebrew solution that cleanly handles at least 3Gbps of IPsec on the WAN side plus all the internal SFP+ VLAN routing?

In short, I suspect the question may be “what is the SFP+ version of the router that you are recommending and how much am I looking at per unit”?

Many thanks,
– Lucky

FredFerrell · February 22, 2022, 1:44pm

I agree that the router I listed wouldn’t support your bandwidth and connectivity requirements as is. I just wanted to provide an example for the licensing required. In your specific use case I would research the Cisco ASR routers such as the 1001-X and add whatever line cards you need for connectivity.

It also makes sense to get some quotes for SDWAN so you’ll know the costs either direction you decide to go.

As for sizing a Dell server for 3 Gb of IPSec traffic, I wouldn’t go down this road. If your intent is to use multiple circuits simultaneously pfSense doesn’t seem to be a good solution.