[Proxmox Deployment Advice] Designing a Reliable, Cost-Effectiv

Hi everyone,

I’m currently working on redesigning the IT infrastructure for a retail company with 8 stores + a Head Office (HQ). I’d love your insights on architecture, storage, and hypervisor choices based on what I have and what I aim to achieve.


CURRENT SITUATION

  • The infrastructure is fully physical.
  • HQ has 4x HPE DL380 Gen10 servers, each with:
    • 2x Intel Xeon Gold 6138 (40 threads total)
    • 64GB RAM (each – can be upgraded)
    • RAID controller onboard (Smart Array)
    • All servers came with 4x SAS HDDs (10K RPM 2.4TB – HPE 881507-001).
  • We also have old Dell R610s at shops running lightweight apps.
  • Users connect to servers via RDP sessions (Remote Desktop), even for very light workloads.
  • This results in power waste, fragmented management, resource underutilization, and no virtualization.

GOALS

  • Centralize all VM workloads at HQ (no more servers in stores).
  • Reduce power consumption and physical complexity.
  • Achieve cost-effective, stable, long-term infrastructure.
  • Virtualize POS app servers, File Server, Finance (SAGE), AD/DNS, and some Linux tools (Wazuh, Zabbix, GLPI).
  • Ensure safe backups and possibly minimal HA without overkill.

CONCERNS

  • I considered Ceph, but I’m worried about:
    • Complexity of setup and maintenance
    • Need for 10GbE storage backend
    • Higher resource usage and possible instability
  • I also wonder if ZFS is reliable enough for production, especially with many Windows Server VMs.
  • Is it safe to run PBS in-cluster?
  • Can I reuse the SAS 10K disks efficiently, or should I ditch them?
  • Should I mix SATA/NVMe and SAS in this setup?

OPTIONS I’M CONSIDERING

:small_blue_diamond: Option 1: 3x Standalone Proxmox Hosts + 1 Backup Host

  • 3 standalone PVE nodes (each with ZFS local mirror for VMs)
  • Node4 hosts PBS and TrueNAS (as VMs)
  • Each node handles part of the VM load
  • Simple setup, no clustering, easy maintenance

:white_check_mark: Pros:

  • Simplicity, isolation
  • Low overhead
  • Clean backups

:cross_mark: Cons:

  • No HA
  • No shared storage
  • Manual failover

:small_blue_diamond: Option 2: 2x Hyper-V Hosts + 1 Proxmox for Tools

  • 2 Hyper-V nodes for Windows VMs
  • 1 Proxmox node for Linux tools (Wazuh, Zabbix, etc.)
  • Use Windows-friendly backups (Veeam / Windows Server Backup)

:white_check_mark: Pros:

  • Simpler for Windows
  • Familiar to staff
  • Easy RDS-style setup

:cross_mark: Cons:

  • Fragmented stack
  • No advanced features (ZFS, PBS)
  • Monitoring/tools on separate platform

:small_blue_diamond: Option 3: Full 4-Node Proxmox HA Cluster

  • All 4 DL380s in a single Proxmox Cluster
  • Each node runs ZFS (NVMe mirror) locally
  • VMs distributed across nodes
  • PBS and TrueNAS run as VMs in Node4
  • HDD SAS passthrough via HBA to PBS / TrueNAS

:white_check_mark: Pros:

  • Single pane of glass
  • Native HA / quorum
  • Optimal reuse of hardware
  • PBS + NAS integrated

:cross_mark: Cons:

  • A bit more complex (quorum, fencing)
  • PBS in-cluster (questionable redundancy)
  • No shared storage, so only limited live migration unless manually handled

:hammer_and_wrench: HARDWARE BEING CONSIDERED

  • Add 2x SSD SATA 960GB per node (RAID1 boot ext4 via Smart Array)
  • Add 2x Micron 7450 Pro/Max 1.92TB U.2 per node (ZFS mirror)
  • Add LSI 9300-8i HBA to handle SAS HDD passthrough
  • Reuse 10K SAS drives (2.4TB x12) on Node4 for PBS / TrueNAS via ZFS RAIDZ2
  • Use FortiGate 81F + Cisco C9200L 10Gb Core Switch

:red_question_mark: QUESTIONS

  1. Is Option 3 (Proxmox HA) a good long-term approach for this small/medium setup?
  2. Would Ceph really be worth the complexity in my case?
  3. Can I mix ZFS local pools per node with Proxmox HA cluster safely?
  4. Is it safe to run PBS in-cluster (Node4)?
  5. Would it be a mistake to reuse the SAS 10K drives via passthrough for PBS/NAS?
  6. For Windows workloads, is ZFS still the best choice or should I use ext4/RAID?
  7. Should I prefer ZFS boot or ext4 boot for these servers?
  8. Any advice on using Micron 7450 Pro vs Samsung PM893 vs HPE SSDs?

I’m seeking the most balanced, stable, and future-proof setup using the 4 DL380 Gen10 servers I already own. I want to make informed hardware and configuration choices (disk type, FS, layout, VM placement, PBS strategy…) and avoid the usual regrets (wasting SAS drives, under-using NVMe, no HA, etc.).

Any expert advice, real-world feedback, or example architectures would be tremendously appreciated.

Thanks in advance!

I prefer XCP-ng over Proxmox, HA is nice to have but always adds complexity. Hyper-converged storage solutions such as CEPH are not good on performances unless you have at least a 25G connecting them and they still won’t have amazing write performance unless you NVMe for the OSD.

My simple setup for budget and performance is XCP-ng system with TrueNAS shared storage.

XCP-ng & XO can also provide a good DR solution.

1 Like

At my office, I have a similar setup to your option A, though I don’t see any reason not to cluster the three PVE nodes.

I have a three-node cluster, each has local ZFS storage, which you can use as quasi-shared storage. The way you do that is to make sure the datastores are named identically in each node and then mark the storage as shared. With that in place, you’ll be able to setup periodic replication from machine to machine for specific VMs, which makes migration and failover among nodes pretty quick. Works really well for anything that doesn’t require to-the-second state be kept.

For VMs that do require that state be maintained, I use NFS datastores on a separate NAS.

I have opinions on some of your numbered questions, and I’ll try to come back to this after work.

1 Like

Removing all local servers might not be best, depending on workflow. If the internet link goes down, will they still be able to conduct business? Local services for price database and cash register seem almost required in retail.

Okay, this may or may not prove helpful, but here’s what I’ve got:

  1. I like option 1 better, but in a cluster. Though @Greg_E brings up a great point about the potential for issues at branches if network connectivity isn’t rock solid.
  2. Lots of people like ceph, but my testing was kind of mixed and I ultimately decided it wasn’t worth the complexity.
  3. Yes, I’ve had great success with this. Name the datastores the same when you add them to PVE and mark them as shared. Then you can set periodic replication to the other nodes on a per-VM basis. I have several that replicate every 15 minutes and it greatly speeds up migration or failover.
  4. Yes-ish. I do this at home in my lab and it’s fine. At work I prefer separate backup hardware because it’s less brittle for disaster recovery (I don’t have to rely on any existing PVE nodes to recover VMs if everything goes wrong). But: my backup server is running Proxmox (outside my cluster) and I have PBS in an LXC container so I ca easily take snapshots from the UI.
  5. I don’t have any experience with 10k SAS disks, but pass through is solid. Again, I prefer separate hardware for my storage, but virtualized NAS options work fine.
  6. I’d much rather run on ZFS than any traditional RAID. One tip: when you add the ZFS datastore to PVE, the default volblocksize for zvols in PVE is lower than optimal for most workloads. There’s really good discussion of this on practicalzfs.com, but essentially, setting it to 64k is a good bet for Windows hosts.
  7. Again, I’d always prefer ZFS. You get snapshots, scrubs, and replication out of it. Makes backups and disaster recovery much nicer once you wrap your head around. Sanoid and syncoid are great tools for managing that.
  8. I don’t have any thoughts on this one. I’ve shied away from M.2 form factors in servers for concerns over thermal issues, I’m using SATA SSDs and SAS spinning rust.

Hi everyone,

Thanks again for the valuable input in this thread your advice has really helped me shape the direction of my deployment.


:white_check_mark: Final Decision (Mixed Hypervisor Strategy)

After much consideration, I’ve decided not to fully migrate to Proxmox just yet.

  • I’m installing Hyper-V on 2 of my HPE DL380 Gen10 servers to host core production workloads (POS application servers, SAGE, AD, DNS, file shares, etc.)
  • These workloads are too critical to risk major disruption, and I managed to get very affordable Datacenter licenses, which makes Hyper-V a safer bet for now.

:test_tube: Proxmox for Monitoring, SOC & Add-ons

The other 2 DL380 Gen10 servers are being dedicated to Proxmox, where I’ll run:

  • Zabbix, Grafana, GLPI
  • A mini SOC stack (Wazuh, Velociraptor)
  • A few test VMs (Linux, Windows)
  • Possibly PBS (Proxmox Backup Server), though I still have questions about its placement (see below)

These services are not mission-critical, but rather value-added initiatives from me so I can safely experiment and get comfortable with Proxmox and ZFS without pressure.


:gear: Current Storage Plan (Work-in-Progress)

Each DL380 has a single 8-bay backplane and is currently equipped with 4 × 2.4TB 10K SAS HDDs.

:yellow_square: 1) BOOT Disks (RAID1 + ext4)

Objective: reliability, decent endurance, minimal cost. 240–960 GB is plenty.

:white_check_mark: Most likely choice:

  • 2 × Micron 5300 PRO 480 GB SATA (1 DWPD)
    • Excellent value, reliable, long lifespan seems ideal for boot OS.

:package: OEM-style alternatives (though I’m not strict about OEM):

  • 2 × HPE 480 GB SAS RI SSD – P04516-B21
    • SAS dual-port; solid, but more expensive.
  • 2 × HPE 480 GB SATA RI (or P18424-B21 if going NVMe, which I’d rather avoid)

:light_bulb: These will be configured as RAID1 with ext4, using the SmartArray controller (with cache and battery).

:pushpin: Note: I don’t care about staying OEM-compliant. I just want enterprise-grade reliability and durability whichever vendor delivers it best (Micron or HPE).


:green_square: 2) ZFS Pool for VMs (mirrors preferred)

:light_bulb: Important: I don’t want to place ZFS disks behind the SmartArray RAID controller. I plan to use a LSI 9300-8i HBA (IT Mode) for full passthrough.

:red_triangle_pointed_down: BUT: I’d prefer to avoid NVMe altogether if possible:

  • I want to keep things simple, avoid special backplanes, PCIe adapters, U.2/U.3 cabling, etc.
  • I’m not convinced the added complexity of NVMe is justified for my use case.

That said, here are the options I’m considering:

:blue_circle: High-performance / Long-term (if I must go NVMe):

  • 4 × Micron 7450 PRO 1.92 TB U.2 NVMe (1 DWPD)
    • Great endurance & low latency
    • Layout: 2 mirrored pairs (ZFS RAID10) → ~3.8 TB usable

:green_circle: Balanced option using existing SAS backplane (preferred):

  • 4 × HPE 1.92 TB SAS RI SSD (P07922-B21)
    • ZFS mirror layout (2 pairs) → ~3.8 TB usable
    • No need to touch backplane — cleaner and simpler

:speech_balloon: Questions I have here:

  • Between Micron vs HPE SAS SSDs, which would be the most durable and stable choice for ZFS in a Proxmox setup?
  • Is Micron 5300/7450 series known to work well long-term under Proxmox ZFS?
  • Any gotchas or incompatibility issues with Micron disks in DL380 Gen10?

:brown_square: 3) Reusing 10K SAS HDDs

I’m also planning to reuse some of the original 10K SAS drives (2.4TB) in the Proxmox nodes 2 or 3 per server, for:

  • Logs or long-retention datasets
  • Passthrough to a PBS or TrueNAS VM via the HBA
  • Possibly for local testing of ZFS RAIDZ2, if useful

:light_bulb: I’m assuming passthrough via LSI 9300-8i HBA will work reliably if anyone has tested that combo with DL380 Gen10 + PBS/TrueNAS VMs, I’d love confirmation.


:brain: Clustering & Quorum Questions

I’m setting up a 2-node Proxmox cluster (just the two Proxmox DL380s the others are Hyper-V).

:red_triangle_pointed_up: I understand that a 2-node cluster requires a QDevice to avoid split-brain.

  • :white_check_mark: Is a Raspberry Pi or Intel NUC sufficient for QDevice?
  • :puzzle_piece: Where should the QDevice ideally be placed in this setup?
  • :pushpin: Any best practices when using QDevice in mixed OS environments (e.g. with Hyper-V nodes on the same LAN)?

:luggage: PBS Deployment Question

I need advice on where to place Proxmox Backup Server (PBS).

Options I’m considering:

  1. Run PBS as a VM within one of the 2 Proxmox nodes (non-critical workloads only)
  2. Deploy PBS on a spare R610 or R620 (standalone, no cluster)

:red_question_mark:Which would be more resilient, easier to recover, and cleaner in practice?

  • Is PBS as a VM “safe enough” if all ZFS volumes and backups are well replicated externally?
  • Or should I offload it entirely to a physical box to avoid dependency on Proxmox in case of disaster?

:bullseye: My Overall Goal

I want to build a simple, reliable Proxmox cluster to gradually explore the platform while keeping things maintainable. In summary:

  • :white_check_mark: Minimal complexity (avoid NVMe unless truly justified)
  • :white_check_mark: Durable storage (long DWPD, low failure risk)
  • :white_check_mark: Proper separation (SmartArray for boot, HBA for ZFS)
  • :white_check_mark: Clean PBS strategy (not tangled into production)
  • :white_check_mark: Safe 2-node cluster with proper quorum setup
  • :cross_mark: No Ceph or heavy shared storage (overkill for my use)

Thanks again for your time and your expertise. Any feedback on disk models, HBA behavior, ZFS layouts, QDevice placement, or PBS design would be extremely appreciated.

Best regards,

I can answer the PBS question.
(Currently, I am running PBS on TrueNAS as a VM, at home and work.)

Having a separate machine (as Proxmox recommends) is easier. If you use a bare metal install, it is even easier. You can create a ZFS volume on the disks (R610) and use it as a target.

It is much faster if you use a local connection. In PBS, you still have the option to create sync tasks. (Sadly, only to another PBS.)

I like the option to browse the files in the VM and restore them individually, and the deduplication. That is a massive advantage compared to normal NFS or SMB.