[Proxmox Deployment Advice] Designing a Reliable, Cost-Effectiv

Hi everyone,

I’m currently working on redesigning the IT infrastructure for a retail company with 8 stores + a Head Office (HQ). I’d love your insights on architecture, storage, and hypervisor choices based on what I have and what I aim to achieve.


CURRENT SITUATION

  • The infrastructure is fully physical.
  • HQ has 4x HPE DL380 Gen10 servers, each with:
    • 2x Intel Xeon Gold 6138 (40 threads total)
    • 64GB RAM (each – can be upgraded)
    • RAID controller onboard (Smart Array)
    • All servers came with 4x SAS HDDs (10K RPM 2.4TB – HPE 881507-001).
  • We also have old Dell R610s at shops running lightweight apps.
  • Users connect to servers via RDP sessions (Remote Desktop), even for very light workloads.
  • This results in power waste, fragmented management, resource underutilization, and no virtualization.

GOALS

  • Centralize all VM workloads at HQ (no more servers in stores).
  • Reduce power consumption and physical complexity.
  • Achieve cost-effective, stable, long-term infrastructure.
  • Virtualize POS app servers, File Server, Finance (SAGE), AD/DNS, and some Linux tools (Wazuh, Zabbix, GLPI).
  • Ensure safe backups and possibly minimal HA without overkill.

CONCERNS

  • I considered Ceph, but I’m worried about:
    • Complexity of setup and maintenance
    • Need for 10GbE storage backend
    • Higher resource usage and possible instability
  • I also wonder if ZFS is reliable enough for production, especially with many Windows Server VMs.
  • Is it safe to run PBS in-cluster?
  • Can I reuse the SAS 10K disks efficiently, or should I ditch them?
  • Should I mix SATA/NVMe and SAS in this setup?

OPTIONS I’M CONSIDERING

:small_blue_diamond: Option 1: 3x Standalone Proxmox Hosts + 1 Backup Host

  • 3 standalone PVE nodes (each with ZFS local mirror for VMs)
  • Node4 hosts PBS and TrueNAS (as VMs)
  • Each node handles part of the VM load
  • Simple setup, no clustering, easy maintenance

:white_check_mark: Pros:

  • Simplicity, isolation
  • Low overhead
  • Clean backups

:cross_mark: Cons:

  • No HA
  • No shared storage
  • Manual failover

:small_blue_diamond: Option 2: 2x Hyper-V Hosts + 1 Proxmox for Tools

  • 2 Hyper-V nodes for Windows VMs
  • 1 Proxmox node for Linux tools (Wazuh, Zabbix, etc.)
  • Use Windows-friendly backups (Veeam / Windows Server Backup)

:white_check_mark: Pros:

  • Simpler for Windows
  • Familiar to staff
  • Easy RDS-style setup

:cross_mark: Cons:

  • Fragmented stack
  • No advanced features (ZFS, PBS)
  • Monitoring/tools on separate platform

:small_blue_diamond: Option 3: Full 4-Node Proxmox HA Cluster

  • All 4 DL380s in a single Proxmox Cluster
  • Each node runs ZFS (NVMe mirror) locally
  • VMs distributed across nodes
  • PBS and TrueNAS run as VMs in Node4
  • HDD SAS passthrough via HBA to PBS / TrueNAS

:white_check_mark: Pros:

  • Single pane of glass
  • Native HA / quorum
  • Optimal reuse of hardware
  • PBS + NAS integrated

:cross_mark: Cons:

  • A bit more complex (quorum, fencing)
  • PBS in-cluster (questionable redundancy)
  • No shared storage, so only limited live migration unless manually handled

:hammer_and_wrench: HARDWARE BEING CONSIDERED

  • Add 2x SSD SATA 960GB per node (RAID1 boot ext4 via Smart Array)
  • Add 2x Micron 7450 Pro/Max 1.92TB U.2 per node (ZFS mirror)
  • Add LSI 9300-8i HBA to handle SAS HDD passthrough
  • Reuse 10K SAS drives (2.4TB x12) on Node4 for PBS / TrueNAS via ZFS RAIDZ2
  • Use FortiGate 81F + Cisco C9200L 10Gb Core Switch

:red_question_mark: QUESTIONS

  1. Is Option 3 (Proxmox HA) a good long-term approach for this small/medium setup?
  2. Would Ceph really be worth the complexity in my case?
  3. Can I mix ZFS local pools per node with Proxmox HA cluster safely?
  4. Is it safe to run PBS in-cluster (Node4)?
  5. Would it be a mistake to reuse the SAS 10K drives via passthrough for PBS/NAS?
  6. For Windows workloads, is ZFS still the best choice or should I use ext4/RAID?
  7. Should I prefer ZFS boot or ext4 boot for these servers?
  8. Any advice on using Micron 7450 Pro vs Samsung PM893 vs HPE SSDs?

I’m seeking the most balanced, stable, and future-proof setup using the 4 DL380 Gen10 servers I already own. I want to make informed hardware and configuration choices (disk type, FS, layout, VM placement, PBS strategy…) and avoid the usual regrets (wasting SAS drives, under-using NVMe, no HA, etc.).

Any expert advice, real-world feedback, or example architectures would be tremendously appreciated.

Thanks in advance!

I prefer XCP-ng over Proxmox, HA is nice to have but always adds complexity. Hyper-converged storage solutions such as CEPH are not good on performances unless you have at least a 25G connecting them and they still won’t have amazing write performance unless you NVMe for the OSD.

My simple setup for budget and performance is XCP-ng system with TrueNAS shared storage.

XCP-ng & XO can also provide a good DR solution.

1 Like

At my office, I have a similar setup to your option A, though I don’t see any reason not to cluster the three PVE nodes.

I have a three-node cluster, each has local ZFS storage, which you can use as quasi-shared storage. The way you do that is to make sure the datastores are named identically in each node and then mark the storage as shared. With that in place, you’ll be able to setup periodic replication from machine to machine for specific VMs, which makes migration and failover among nodes pretty quick. Works really well for anything that doesn’t require to-the-second state be kept.

For VMs that do require that state be maintained, I use NFS datastores on a separate NAS.

I have opinions on some of your numbered questions, and I’ll try to come back to this after work.

1 Like

Removing all local servers might not be best, depending on workflow. If the internet link goes down, will they still be able to conduct business? Local services for price database and cash register seem almost required in retail.

Okay, this may or may not prove helpful, but here’s what I’ve got:

  1. I like option 1 better, but in a cluster. Though @Greg_E brings up a great point about the potential for issues at branches if network connectivity isn’t rock solid.
  2. Lots of people like ceph, but my testing was kind of mixed and I ultimately decided it wasn’t worth the complexity.
  3. Yes, I’ve had great success with this. Name the datastores the same when you add them to PVE and mark them as shared. Then you can set periodic replication to the other nodes on a per-VM basis. I have several that replicate every 15 minutes and it greatly speeds up migration or failover.
  4. Yes-ish. I do this at home in my lab and it’s fine. At work I prefer separate backup hardware because it’s less brittle for disaster recovery (I don’t have to rely on any existing PVE nodes to recover VMs if everything goes wrong). But: my backup server is running Proxmox (outside my cluster) and I have PBS in an LXC container so I ca easily take snapshots from the UI.
  5. I don’t have any experience with 10k SAS disks, but pass through is solid. Again, I prefer separate hardware for my storage, but virtualized NAS options work fine.
  6. I’d much rather run on ZFS than any traditional RAID. One tip: when you add the ZFS datastore to PVE, the default volblocksize for zvols in PVE is lower than optimal for most workloads. There’s really good discussion of this on practicalzfs.com, but essentially, setting it to 64k is a good bet for Windows hosts.
  7. Again, I’d always prefer ZFS. You get snapshots, scrubs, and replication out of it. Makes backups and disaster recovery much nicer once you wrap your head around. Sanoid and syncoid are great tools for managing that.
  8. I don’t have any thoughts on this one. I’ve shied away from M.2 form factors in servers for concerns over thermal issues, I’m using SATA SSDs and SAS spinning rust.