Proxmox multi-node SMB/CIFS question

DrHeat · February 7, 2024, 9:31pm

In my home lab I am running a multi node Proxmox with no clustering. Let’s call the nodes A and B.

On node A there is a TrueNAS installation that exports a share via SMB. On node B I have added this share via Datacenter > Storage as a target for disk images.

Also on node B there is a VM, let’s call it B1VM, that has one of the disks on this SMB/CIFS storage.

Given A and B are not part of a cluster, when I start both nodes how can I make sure node B does not start B1VM until Proxmox has successfully mapped the SMB storage from node A? Am I overthinking it and does this happen by default, so a VM doesn’t start until all it’s resources are available?

stildalf · February 8, 2024, 7:38am

I haven’t tried this per se, but I suspect it might just fail the VM start job depending on whether the share is available or not. If I recall correctly, it behaves similarly when requiring boot ISOs over shared storage. You could add a startup delay to the VM, but will still be hit & miss.

A bigger issue for me would be the performance hit of that VM disk image over SMB/CIFS. If the image storage really can’t be moved local consider iSCSI or Ceph. Having said that, managing either of these would also benefit from running in a cluster, and give you other abilities such as High Availability.

DrHeat · February 8, 2024, 9:15am

Thank you for the thoughtful reply!

The performance in this particular case isn’t a big deal. It’s mostly lots of data that needs to be written somewhere over time, and seldom read. The reliability of the mounting process interested me more.

I realize all this could be reliably done via units with automount at guest level, this is more of an exercise how can this be achieved in a reliable way than a practical “this must be so.” The end goal being I could potentially have some cheap small disk nodes (or even diskless with booting over network) added in the future that load all the VM data from the NAS.

I have considered iSCSI, and will play with it in future as a learning exercise, but again for this particular case is overkill.

About the cluster I’ve thought long and hard, and have come up with more cons than pros, hence no cluster. Here’s a brief overview:

I only have 3 physical nodes. If one of them isn’t started, the cluster doesn’t reach quorum. I’d like to preserve the ability to power on a single node rather than all 3. Could add some sort of Raspberry Pi or some such as a 4th node, but this adds more complexity and points of failure.
While some VMs can be transparently moved between the nodes, most critical ones can’t.

For example OPNSense is tied to a particular node with particular network cards that are pass through to the guest. On other nodes there’s also additional network cards pass through to various VM guests for various purposes.

On another node there’s a Nvidia Quadro pass through to the guest VM. And yet another VM with the TrueNAS installation has both the HBA as well as all the disks pass through to it.

So running a cluster in this particular case I feel would only add a fair bit more complexity and points of failure, while the benefits of live migration and HA would only apply to a handful of VMs that I can move manually between nodes anyway since they all share the same 3 Proxmox Backup servers. And for those that need it I have HA in other ways (at server level).

I’d like to explore further the possibility of something that starts VMs in other ways, e.g. via a service unit on the node, which can then be tied into the SMB/CIFS being present. Or iSCSI, or any other form of synchronization between nodes. Thoughts?

EDIT: Someone suggested elsewhere hookscripts. I wasn’t aware this functionality existed and would probably solve this.

Louie1961 · February 8, 2024, 12:46pm

You could handle this at the VM level by putting in a startup delay or a shutdown delay under the VM options tab. Its a bit of a hack and not foolproof, but should work

DrHeat · February 8, 2024, 12:56pm

Yes, that works flawlessly if both VMs are on the same node and I’m actively using it (startup order, not delay, but similar idea) to ensure the TrueNAS VM is started first, then all other VMs on that machine. Not so much if they’re on separate nodes as the delay could be anywhere between a couple seconds to minutes, maybe more.

Ideally it should be something that keeps trying to mount that share, and if it fails sleeps for like 5 sec. Then once it succeeds it proceeds to start the VM.

Going to look into hookscripts. Worst case scenario, I’ll learn something new that I didn’t knew before

stildalf · February 9, 2024, 10:10am

Yeah, I see why you are reluctant to a cluster setup. I would add maintaining PVE version parity in a cluster as a superficial con as well - at least adding to the list of considerations in maintaining such a setup.

Not trying to convince you, just stating for the sake of clarity:

You could always change the quorum requirements if needs be. So if one node is down in a 3 cluster setup pvecm expected 2 would allow the remaining two nodes to startup. Soon as the failed node rejoins the quorum expectations will be reset.
You can create HA Groups in the GUI to restrict HA migrations, preventing or locking down certain VMs to relevant hosts.

Well spotted on the hookscripts, I’ve yet to try them myself. So going forward you could either:

Generate a hookscript per VM, which waits for the share to mount before allowing VM start.
Add a systemd service unit, which waits for the share to mount before allowing pve-guests.service to run at startup. So the pve-guests.service unit would Require and run After your smb-mount-wait.service

If it were me, I would probably go for the unit require after.

Let us know which way you go, or if you’d like us put a quick service unit together.

DrHeat · February 9, 2024, 10:15pm

Thank you for the insightful feedback on the cluster. I’ll give it a fair bit more thought in the (near) future.

Regarding the service units that’s what I’m using on several VMs to map their Samba shares since fstab isn’t always reliably mounting them. It’s very clever to suggest using the same approach for the virtualization node itself and something that didn’t cross my mind. Thank you! It is something I’ll definitely play with as it may solve a number of other startup ordering and dependencies between nodes.

I’ll explore the hookscripts first simply because it’s something new and shiny and makes me super curious to learn more.