MS-A2 / HL15 Home Lab xcpng Server Concept

Hey Folks,

I wondered if someone that’s smarter than I am on this topic could help. Looking to set up a xcpng setup on 1 MS-A2 initially, with shared drive space in an HL15 with 15 Mechanical Drives, and 2x 4TB NVMe drives connected via OCuLink-to-U2 NVMe adapters. The PCI 3.0 motherboard seems like it will support this configuration, and I was considering running this in a RAID-0 Setup.

The storage server’s 15 mechanical drives will be for backup storage from the VMs but also for a cloud backup service that I run for friends and family. I was thinking the NVMes would be for VM storage, but here’s what I’m thinking. With the HL15’s PCIe 3.0 speed cap, I’m thinking that the MS-A2 with the 2x 4TB NVMe drives will make the VMs more performant. My question is- can this storage be ‘replicated’ to the NVMes in the HL15? I’m sort of looking to almost add the HL15 as a replication partner of the MS-A2 for this VM Storage.

Am I thinking about this all wrong? None of this will matter for the few light Linux VMs that will run on there, but there will be a couple of Windows hosts that have historically run a little slow on my aging ESXi infrastructure. I’d like to juice the performance of the VMs themselves.

I should add- I was planning to run whatever OS comes on the HL15 unless removing it and putting something like TrueNAS on there would be preferred. (I have a forum post linking to a video Tom made about xcpng best practices bookmarked, and I will reference that too as I set up.). I will either have 10Gbe or 25GBe DAC cables connected from the MS-A2 to the HL15 directly (if that’s even possible, or via a 25GBe SFP switch in between.)

This will be my first foray into the HL15, xcpng, and the MS-A2. The HL15 and MS-A2 and drives are already purchased. Once I have all of the VMs migrated off of ESXi, I will repurpose that slower machine and add it to the cluster.. maybe just to run the XO VM, and provide for somewhere to move VMs to for maintenance. I plan to add a second MS-A2 with the same configuration later this year, and have that 3 PC cluster with the shared storage take over the day to day.

Open to any ideas anyone has here, and thank you for your input!

In general, you should be fine running Windows VMs over NFS, not the fastest, but good enough for my lab and my production system. Only have a 10g copper connection to my NAS. It was mostly fine when I only had gigabit in my lab, so I wouldn’t worry too much until you see an issue. Windows Updates are always the slow part and might take longer on gigabit.

Unless you put a 25g card into the MS-A2, you only get 10g ports, so no need to buy sfp28 cables. It might be worth putting a small 10g sfp+ switch between, because sooner or later you will expand and want more things at 10g. Mikrotik makes a small 4-5 port, they make an 8 port, and of course larger like the CRS326-24s+2q+. This brand may be your cheapest way into 10g switching, keep on eye on ebay or other sites for used stuff as people upgrade from the smaller switches.

While many people don’t love Mikrotik, they do get a job done and I have a couple (the above 326 and the 8+1 port sfp+ switches). SwitchOS or RouterOS depending on what you want to do. They are a little faster in SwitchOS when you get all the ports filled up, they have pretty good data sheets on the three modes and their speeds (SwitchOS, RouterOS bridge, RouterOS router), the cpu is a bit weak, so RouterOS in router mode gets a bit slow when you are trying to work at capacity.

Why do you need to replicate? Would a backup suffice?

Thanks for the replies. The VMs will be backed up at least daily to the mechanical storage, so that’s that. The goal of the replication would be more for swapping the VMs to other hosts in the cluster I think. I’m not sure. If I need to migrate the whole VM, and the VM isn’t using shared storage on the storage server, then xcp-ng will need to replicate everything and that would take some time. I was thinking that if instead there was an in-sync copy of the running VM’s disk on the storage server (on NVMe), that process to move it to the new cluster host would be faster. Not as fast as shared storage since the disk image wouldn’t need to be copied. Also, if there were a disk issue on the MS-A2 and the data/image was unusable, I thought this replication might help to recover a closer-to-live disk image compared to the last daily backup that might be 12 hours old.

Maybe xcp-ng doesn’t support this, I’m not sure. I definitely need to do some more reading on all of this, I just thought the folks here might have some direction to give me as far as architecting this setup.

I think that’s what XOSTOR is supposed to do: distributed and high availability storage. The XOSTOR storage will be synchronized between hosts with up to 3 replicas, and if a host goes down then nothing will be lost.

Ceph is the other solution that’s popular because Proxmox supports it out of the box.

Good question and some decent details in there :slight_smile: My advice would be, keep it simple.

An approach I’d implement if this were my setup, would be to setup the MS-A2 as your first XCP-ng host with NVMe storage. Configure these as local Storage Repositories (SR) in XCP-ng. That way, you’ve got your fast storage for Windows guests.

I assume you’ll run the Xen Orchestra as a VM on the MS-A2? If yes, remember that it will need a ‘remote’ to be configured, either NFS or SMB, for backup / replication storage, and that it will need to consume network bandwidth to transfer to/from these remotes. So my suggestion would be to setup a storage pool on the ML15 for backups, and perhaps another storage pool for a shared NFS repository, to be used by the XCP-ng ‘pool’ (even though it’s 1 host now, if you add more later, they can access the same shared storage).

Then, run your Windows VMs on the XCP-ng host’s local NVMe, but replicate them (every 15mins if needed, via XO replication job) to the NFS pool on the ML15. This way, if the worst happens on the MS-A2 or its local NVMe’s, you can boot up the recent replica of the VM (albeit at a slower throughput) very quickly and without too much hassle like restoring, transferring, etc.

Then, configure XO to backup your VMs to the backup storage on your ML15. I’d keep this as a separate RAID/ZFS volume from the shared storage. Depends on your capacity needs, but I typically work on a 2x capacity plan (so if I have 1TB of VMs I need to backup, I ensure I have 2TB of capacity on my backup storage). Perhaps mechanical 9 drives for the backup pool, and the other 6 mechanical drives for the shared storage pool? Just an potential example.

Personally, I’d avoid the NVMe-via-Oculink approach. I’ve used that in the past, and sometimes had a cable bump, or a power cord bump, or a software hiccup cause that to drop out. And if they’re mirrored, you risk the mirror becoming a total loss if the host sees both NVMe’s disappear due to a single-point-of-failure like the Oculink port / cable / dock glitching.

If you really need higher performance on the ML15, perhaps sacrifice a few bays and put SSDs in there, in RAID10/ZFS performance equivalent - that way, you’ll get around 1GB/sec potential throughput which should be more than enough; again, I can’t account for every use-case. High-transaction DBs would suffer terribly from the latency - but not sure you’re running anything that is very latency-sensitive.

You mentioned replicating VMs so that you can boot them up on another host. I’d skip this approach. Rather, setup 2 XCP-ng hosts in a pool, with an NFS shared storage pool on the ML15 as their ‘central store’. If you REALLY need VMs to have fast disk speeds, I often store a VM template (or just a VM itself) on the slower shared pool, and then use the XO copy VM function to clone them to the fast storage and use them until they break :slight_smile: - that way, I’ve always got at least 2 copies of anything important.

Hope that helps! Sorry if the above is disjointed - typed it across 3 different focus sessions with work calls inbetween.

Just a reminder - XCP-ng with even 1 host is still in a pool; it’s just the idea of a cluster with 1 node in it. So adding a second XCP-ng host to the pool is simple. You can even do it with disparate hardware, as long as the CPU’s are the same overall architecture (e.g. Intel and Intel, AMD and AMD, ARM and ARM, etc. – you can’t live-migrate a VM from an Intel host to an AMD host; you will need to power it down then power it up on the alternate-CPU-arch host).

I use 2-host XCP-ng pools at home for ‘live’ (10GbE all-flash NFS) and ‘lab’ (2.5GbE single NVMe NFS share) and it’s so easy to migrate VMs between hosts, or store replicas of VMs across each hosts’ local storage for when disk throughput is beneficial.