Designing a Resilient Network for Our Data Center – Need Advice!

We’re working on designing our data center network with a strong focus on failover and resilience.

Our provider gives us two 10Gb feeds, and we’re planning to use:

  • 2 x pfSense (HA)
  • 2 x 10Gb switches
  • 2 x 1Gb switches

I’ve put together a diagram showing my proposed setup.

My question:

  • Have I overlooked anything critical?
  • How would you approach this setup?

I’d really appreciate any feedback or suggestions!

EDIT: This is the guide I have followed - Layer 2 Redundancy | pfSense Documentation

I would make sure you setup your root bridge on the 10G switch that is connected to the primary pfSense.

Also, are all your default gateways configured on your pfSense firewalls?

Agreed. The Root Bridge will be the 10Gb switch connected to the primary pfSense.

Yes, the servers behind pfSense will have that as their gateway. They’ll be CARP ip addresses for HA failover.

Are you using MC-LAG setup for redundancy between the switches and hosts connected to them?

1 Like

The switches we will be using do not support MC-LAG, unfortunately. They will be linked using 10Gb DAC cables.

Are your servers virtualized, bare-metal, or a mix?

They’ll be a mix. Hyper-v clusters, NAS’ for example.

Sounds good. On the server side of things I would look into NIC teaming in independent mode/Active/standby mode or IP bonding where no switch awareness will be required. If you really want to get fancy, you could see about running OSPF locally on the servers as well as the pfSense.

Thanks :slight_smile: Yes, we’re using NIC Teaming in the lab Hyper-V setup which is working well.

I will look into OSPF. I have no experience with it! How would it benefit us?

The benefit is minimal and would be risky if you don’t have any experience with it. As long as you have a L1 or L2 redundancy you’ll be fine.

1 Like

Has there been any considerations on a fully L3 design?

Running L3 switches with OSPF back to your routers/firewalls would be good solution too. You could organize different regions of your farm by IP subnet. i.e. Cluster-1 uses 10.0.1.x/24, Cluster-2 uses 10.0.2.x/24, etc.

This way you’d have total L2 segregation. (I’m just not much of a L2 guy - I move everything L3 where possible)

This also depends on your server connectivity architecture. Are you working with large clusters of identical configuration systems or are we looking at more of a small enterprise setup with one’si-two’si type systems?

Only concern I have with running L3 on the switch is allowing inter-vlan routing. Even though you can use ACLs I would want my network security centralized on the firewall only.

We’ve consider L3 design but we need more granular control than switch ACLs can provide.

Fair enough. My largest concern(s) would be about common points of failure and scale of the network(s) (broadcast domain size, and bandwidth availability (e.i. server X to route via the firewall to talk to server Y if it could just route via the local Top of Rack L3 switch - would save bandwidth on upstream links)).
Again, I don’t know how large your environment is, but just a few thoughts.

What type of granularity isn’t available via ACLs?

Unless we start talking about micro segmentation products such as NSX, ACI, or ISE, I would say very minimal since you can only filter IPs and ports. Even for very large environments, a centralized firewall for east/west traffic filtering is very common.

We have many customer networks which is more easily manged on the firewalls.

Totally makes sense.
I suppose I was thinking more along the lines of “if you can afford to route on your L3 switch, do it. Otherwise, utilize the firewall to handle the inter-vlan routing”