How to handle XCP-NG maintenance

michmoor · September 6, 2024, 1:53am

The background before my quesiton
XCP-NG as the Hypervisor
20 Virtual Machines
2x servers in a pool.
Server 1: Dell R620 , 128GB memory
Server 2:HP Z640, 64GB memory
Storage: NFS on Synology NAS

Server 2 is preferred as it has a nvme drive and its performant for vms like Graylog running and other workloads.

The issue that i have is that I have server updates that require a reboot. I cant migrate my VMs between the servers due to the memory limitation.

My question is , what is the best way to handle maintenance tasks that require server reboots? Do i just go in there and reboot the server with running VMs? Is there a less brute way of doing this?
Additionally, I will need to update my Synology NAS servers soon and seeing how its used for the majority of my VDIs, how do i handle that maintenance? I don’t want to shift my VMs to another NAS as its a much weaker system and it will most likely tip over.

This is a home lab hence the inequality in system specs.

xMAXIMUSx · September 6, 2024, 2:37am

The proper infrastructure for an always on environment.

A SAN with multi node support. Like a truenas with 2 controllers
Xcpng hosts with enough memory for all VM’s to be migrated to other hosts for updates.
Xcpng is using truenas for its SR.

With a dual controller SAN you can update a controller and failover to upgrade the other controller without downtime

With the proper resources on your xcpng nodes it is simple enough to evacuate your node for upgrades and then migrate them back. Or if you have the source version of XO you can run the rolling pool update and if you have the load balancing module enabled it will move the VM’s back to their original node.

Because you don’t have this type of setup the best you can do is a scheduled maintenance with downtimes that is least impacting.

The best way to do this is to shutdown all of the VMs on the node and when that is complete I simply run the upgrade in a ssh session and then reboot.

For your SAN upgrades I do the same thing. Shutdown all VMs and then patch your SAN and reboot. Once it comes back up . Turn on your VMs

bb77 · September 6, 2024, 12:08pm

If the guest tools are installed in all of the VMs, that should be fine, as XCP-ng then should shutdown the VMs gracefully before it reboots itself. See the following thread in the XCP-ng forums for more infos: reboot of host does it stop or kill running VM's? | XCP-ng and XO forum

See @xMAXIMUSx’s answer. Alternatively, you could just shut down the host, rather than shutting down all the VMs individually, which should again gracefully shut down your VMs before it actually shuts down.

Greg_E · September 6, 2024, 1:30pm

Are those two servers in a POOL? If so, “Rolling Pool Upgrade”, there is a new “Rolling Pool Reboot” button too. That said, you might have different architectures on the processors, not sure if that will be an issue.

The Rolling Pool options may or may not work so well with local storage, I have all my VM on a NAS.

bb77 · September 6, 2024, 5:14pm

The actual question has already been answered, and @michmoor even told us the reason why the VMs cannot be migrated over.

And let’s be honest, a few minutes of downtime every few weeks isn’t a real issue in in a homelab. Also it’s easier, and cheaper to just reboot the host, than providing the appropriate resources for all your VMs on at least two servers. HA is just not necessary in a homelab.

Maybe, but local storage is usually the better option in a home lab because it doesn’t need 25G networking for decent performance, unless you want a cluster of multiple VM hosts with local storage to get HA. Btw, running all your VMs on a single NAS (assuming it’s not a CEPH cluster or similar) isn’t really HA either, because that NAS also needs to be rebooted from time to time

Greg_E · September 6, 2024, 5:45pm

For most home lab stuff, gigabit is not a big deal. I ran my lab on gigabit for several years and it was tolerable for every Windows domain I ever needed to model. I wouldn’t worry too much about an extra minute of time to save the money for a 10gbps switch and cards, and modules, and cables. Especially so if you are using low powered hosts like n100, n200, etc. processors.

Is more speed nice? Yes. Is it required for learning? No. My storage is only getting me 200MBps writes and 400MBps reads, so that 10gbps system isn’t getting me a huge amount of performance. Figures quoted because I just put in a different drive controller card to see if I could bump the number up higher, going back to the old card because no faster and the old card integrates with the chassis so I get nice flashy lights. Wish I could justify the money for some SSD for this system, but really hard to justify the $400 needed and it wouldn’t even expand the amount of storage at that price.

bb77 · September 6, 2024, 6:19pm

Of course you can do whatever you like, and there may even be good reasons for it, learning being obviously one of them

But to get the most for less, in a home “production” environment like I run, I wouldn’t do it that way.