XCP-NG with NFS SRs - VMs go into read-only mode when NFS issues occur

dbsoundman · July 25, 2022, 6:27pm

This is a bit hard to describe in the title, I’m hoping to clarify here.

I have an XCP-NG host with a few VMs that use SRs that are connected to the XCP-NG host via NFS. Meaning: to the VM, it’s just a normal disk, but the host connects to the physical storage device via NFS.

Occasionally, some issue will occur, and the VM loses connectivity to the NFS-connected SR for a period of time. (Note that these are Debian Bullseye VMs.) When this happens, the VM usually puts the SR into read-only mode. In all cases, the NFS SRs are secondary disks I use for storing larger files (ex. nextcloud data directory), so it’s acceptable to attempt to remount them as read-write while the VM is up. However, this never works, at least insofar as I’ve attempted it.

My questions:

Can I tweak the VM’s fstab to force it to automatically remount read-write?

Can I also give the VM more “tolerance” for errors on this disk, kind of like a normal NFS share where it will cache a certain amount of operations before it considers the disk failed?
Is there anything I should do on the XCP-NG host of the storage device (TrueNAS) to mitigate these issues? They don’t happen often but frequently they occur when I don’t realize it, until I suddenly realize Nextcloud is offline for some reason or some other infrastructure fails.

LTS_Tom · July 26, 2022, 10:50am

The XCP-NG system as most hypervisors expect a persistence for their storage, I have never tried to mitigate the issues that come with storage loss at the hypervisor level, we generally try to make the storage more robust.

flguy76 · July 26, 2022, 2:48pm

I have a backup TrueNAS server and a powershell script that if 30 seconds it sees in your case the cloud is down it switches dns records on all dc’s and GC’s. Enterprise servers fail very few times, i also have a co-loc that i replicate changes to for DR, (living in Florida hurricanes are a threat) my backup server is a VM and i have another server thst i can just switch over my 24 ssd drives to and upload my backup config. But after almost 2 years i have never had a failure at the host level. Only one memory stick failed due to ecc errors. But im redundant at every single point in the companies interal mini datacenter and if hurricane comes i switch to dr where i have every user workstation and the co-loc machine is constantly updated. I also do backups via backup exec to the windows shares daily and replicate the small amount of daily changes. And it goes to tape every 2-3 days. Also do snapshots every 4 hours to protect against Ransom ware attacks. Which hasnt happened . Also im using old gear r720xd for the TrueNAS servers but the disks are fast and the data is almost all text. Not bigger stuff like video editing etc… always be redundant. I dont run VM’s on TrueNAS either i have 2 host servers where i split my DC’s and other servers of importance.

dbsoundman · July 26, 2022, 3:12pm

Thanks Tom. To be fair, I think my problem is that my UPS batteries are 6 years old…I did a quick “pull the plug” test the other day and went from 100% to 68% in 9 seconds! I’m thinking it’s self-testing and causing my primary issue.

e.m.mcdonald · December 8, 2024, 10:18pm

Dan,

Thanks for posting about this, as I too was plagued with this ongoing NFS SR issue for a while and found it hard to describe (…not anymore). After some digging, I found in the XCP-ng NFS python code (/opt/xensource/sm/nfs.py ) that there are two main "other-config " options that you can pass to an NFS-based SR mount at the hypervisor level. You can pass an "nfs-timeout " value (which correlates to the NFS "timeo " option) and/or a “nfs-retrans” value (which correlates to the NFS "retrans " option).

The defaults in XCP-ng should be:

nfs-timeout (timeo) = 200 (this is in tenths of a second, i.e. 20 seconds)
nfs-retrans (retrans) = 4

You should see in your current NFS mounts at the command line (just type mount ) “…,timeo=200,retrans=4 ,…” in the NFS options if these defaults have not been changed.

You can change either of these values to make an NFS-based SR "more resilient " to NFS/network outages. The math/algorithm behind the settings should also be in the same “nfs.py” code in the header comments between lines ~27 and 46. I used this info and ChatGPT to compute tables for a fixed nfs-timeout and varying nfs-retrans values and with a fixed nfs-retrans and varying nfs-timeout values.

To gauge how long I would need an SR backend to wait for the target to reboot/recover, I rebooted all of my FreeNAS/TrueNAS appliances one by one and timed how long it took each for a full recovery (then add about 10% for good measure). We have several older servers that we use for testing/experimenting, so the slowest reboot we had was 6 min 30 sec. So, my nfs-timeout and nfs-retrans settings are based on bridging a 7 minute outage or 4200 tenths of a second (i.e., 420 seconds).

I tested both changing nfs-timeout = 4200 and keeping nfs-retrans = 4 and keeping the default nfs-timeout = 200 and increasing the nfs-retrans = 24. Both of these worked! None of our VM disks using the rebooted NFS-backed SR went into a read-only state. You might see some processes/operations not working during this outage period (i.e., no new ssh logins to an impacted VM, existing sessions stayed open), but once the NFS target comes back, all queued-up operations should be flushed out and continue. I did not need to reboot any of my VMs.

You should still see 1 entry in /var/log/kern.log indicating that the NFS server is not responding, e.g.

==> /var/log/kern.log <==

DATE HYPERVISOR kernel: [#] nfs: server IP not responding, timed out

but you should NOT see any ERRORS in the daemon.log indicating that there is an I/O issue, e.g.,

==> /var/log/daemon.log <==

DATE HYPERVISOR tapdisk[#]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: … - Input/output error

To get a list of your NFS-based SRs you can use,

xe sr-list type=nfs

To list the current parameters of an NFS SR you can use,

xe sr-param-list uuid=INSERT_YOUR_SR_UUID_HERE

Note the “other-config” line in the output. If you want to match my settings you can use the following. Note: I set both nfs-retrans and nfs-timeout explicitly just in case the XCP-ng defaults get changed between updates.

xe sr-param-set other-config:nfs-retrans=120 uuid=INSERT_YOUR_SR_UUID_HERE

xe sr-param-set other-config:nfs-timeout=200 uuid=INSERT_YOUR_SR_UUID_HERE

NOTE: You will need to shutdown all VMs using the NFS SR after making changes and unplug/replug the NFS SR, or simply reboot the hypervisor(s) to take effect. On reboot, you can verify that these new other-config settings were used with the mount command, e.g., “…,timeo=200,retrans=120 ,…”.

A copy of my helper script to change all NFS SRs on a hypervisor to these settings can be found at the github link below,

github.com

emmcdonald/xcp-ng/blob/main/check_nfs_sr_other-config.sh

#!/bin/bash
##
## Auth: Michael.McDonald@FSU.edu
## Date: 2024-11-24
## Desc: for each NFS SR UUID... do the following
##

# +------------+--------------+--------------------+----------+
# | NFS Outage |     Time     |       timeo=       | retrans= |
# | (minutes)  | (in Seconds) | (1/10 of a second) |          |
# +------------+--------------+--------------------+----------+
# | 1          | 60           | 200                | 1        |
# | 2          | 120          | 200                | 1        |
# | 3          | 180          | 200                | 2        |
# | 4          | 240          | 200                | 2        |
# | 5          | 300          | 200                | 6        |
# | 6          | 360          | 200                | 6        |
# | 7          | 420          | 200                | 24       |
# | 8          | 480          | 200                | 24       |
# | 9          | 540          | 200                | 120      |

This file has been truncated. show original