XCP-ng on TrueNAS Core - NFS Share

Hi,

Could anyone please tell me if we still need to turn off ‘sync’ on the TrueNAS dataset when using an NFS share for XCP-ng?

Is this command still require on TrueNAS-13.0-U3.1 (Core)

zfs set sync=disabled z2_rusty/NFS_xcp-ng

Thank you

I have disabled sync for nfs shares, you can disable sync in the gui

Not sure if we need to turn it off, but I still did turn it off. If I remember there was a speed penalty for having sync on and it doesn’t do a lot for our use case. Hopefully Tom will see this and give us an update on the need.

Thank you very much @Paul @Greg_E I also hope @LTS_Tom can confirm this.

OK, I made a mistake when I set up a different NFS share and moved my VMs over too it… Sync DEFINITELY slows things down, like crazy amounts of slow. The most I was seeing while Windows was downloading and installing updates was 6.9MBps, turned sync off and getting 50MBps immediately after.

So for our uses, turning Sync off is the way to go.

I looked at the storage graphs in XO and IO throughput went up a lot, wait time went down a lot, and latency seemed to go down a lot too. I was able to transfer files at gigabit speeds doing a copy past from a workstation connected with a gigabit connection. My VMs don’t have 10gb yet, need to move some things around over summer.

@Greg_E thank your for going some testing. I’ll make sure to disable sync then.

@Greg_E sync defenetly has to be turned off. To exceute this fio test, it estimate 3H20min with sync enable and 20 minutes with sync disabled

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4 --numjobs=4

Sometimes it is good to forget/ignore the advice given and test on your own… It reinforces that the advice given was good advice. I forgot and now proved to myself that this was important. I was actually pricing out SSD and NVME that I could swap to try and get more performance, then decided to check and make sure I did what I thought and found I had made an error. Reinforced and probably won’t let that slip again.

Turning off sync is not a good idea. Tom can explain it better than I can, but the simple version is that from the perspective of the hypervisor and also the storage provider (TrueNAS in this case) there is no way for them to know what data is important / critical vs. non-important.
As a result, all writes are sent as sync writes, basically meaning that before the next block of data can be sent to storage, the first block must be committed to storage. That way in the event of a crash / power outage / etc… all data that was sent to disk, actually IS on disk.
This is known as a copy on write file system (do a search for COW in Tom’s videos if you want to know more).
By turning off sync within TrueNAS, you are disabling this safety feature and in the event of a failure, could lose data or have corrupted data.
For example, if the data being written when a failure occurs is an MP3 file to a VM’s storage, likely not a HUGE deal… if that failure occurs when a Windows update is being applied, chances are that VM’s toast and you will be looking for a backup or re-creating. When sync writes are followed, there can be no case where the hypervisor thinks that data was written, when the storage provider doesn’t have that data written.

@albeemichael I came to the same conclusion and decided that I have 2 options…

  1. Get a Creasy fast Slog for NFS
  2. Forget NFS and use iSCSI

I think I’ll go for iSCSI for now as I haven’t got any good SSD for slog at hand.

So thats actually a misconception… iSCSI may not enable sync writes by default, but the same data safety issue exists on that protocol as well… if your using it for VM’s and you care about their data consistency, you should have sync writes enabled, regardless of protocol.

Your best option is to get a slog with VERY low latency. The actually write speed isn’t as important as its being done block by block, its more about how long does the drive take to actually write a block to its internal memory.

Yes but with iSCSI when sync is set to always, I still get the safety but not the performance hit. Do you have any suggestions on a good Slog drive that won’t break the bank?

The underlying file system is ZFS, which is CoW. NFS sync will have no effect on this CoW operation from the filesystem level. I think Tom mentioned that in his video, but I’d need to go back and watch it again.

Also I assume that if data integrity was an issue, Tom would not have suggested turning off NFS sync. Again I should probably watch the video again to make sure I remember things correctly.

@Greg_E @albeemichael I did some testing … and here is the results:

The way my iSCSI is setup is as follow:

The ‘iscsi_storage’ dataset sync is set to ‘inherit (standard)’

Random write test for IOP/s

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4 --numjobs=4
iSCSI sync standard:	WRITE: bw=23.0MiB/s (25.2MB/s), 6136KiB/s-6165KiB/s (6283kB/s-6313kB/s), io=15.8GiB (16.9GB), run=671033-672801msec
iSCSI sync always:		WRITE: bw=2615KiB/s (2677kB/s), 654KiB/s-655KiB/s (669kB/s-670kB/s), io=15.0GiB (17.2GB), run=6404989-6414651msec
iSCSI disable:			WRITE: bw=26.1MiB/s (27.4MB/s), 6681KiB/s-6688KiB/s (6841kB/s-6849kB/s), io=15.5GiB (16.6GB), run=607835-607922msec

Random Read test for IOP/s

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread --ramp_time=4 --numjobs=4
iSCSI sync standard:	READ: bw=199MiB/s (208MB/s), 49.6MiB/s-49.7MiB/s (52.0MB/s-52.1MB/s), io=15.2GiB (16.4GB), run=78551-78628msec
iSCSI sync always:		READ: bw=202MiB/s (212MB/s), 50.4MiB/s-50.7MiB/s (52.9MB/s-53.1MB/s), io=15.2GiB (16.3GB), run=76867-77188msec
iSCSI disable:			READ: bw=198MiB/s (207MB/s), 49.4MiB/s-49.5MiB/s (51.8MB/s-51.9MB/s), io=15.3GiB (16.4GB), run=78893-79008msec

Sequential write test for throughput

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=write --ramp_time=4 --numjobs=4
iSCSI sync standard:	WRITE: bw=372MiB/s (390MB/s), 92.9MiB/s-93.0MiB/s (97.4MB/s-97.5MB/s), io=15.4GiB (16.5GB), run=42357-42451msec
iSCSI sync always: 		WRITE: bw=20.9MiB/s (21.0MB/s), 5361KiB/s-5454KiB/s (5490kB/s-5585kB/s), io=15.0GiB (17.2GB), run=768288-781599msec
iSCSI disable:			WRITE: bw=449MiB/s (470MB/s), 112MiB/s-113MiB/s (117MB/s-118MB/s), io=15.3GiB (16.4GB), run=34739-34879msec

Sequential Read test for throughput

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=read --ramp_time=4 --numjobs=4
iSCSI sync standard:	READ: bw=873MiB/s (916MB/s), 218MiB/s-220MiB/s (229MB/s-231MB/s), io=13.8GiB (14.8GB), run=16034-16135msec
iSCSI sync always:		READ: bw=865MiB/s (907MB/s), 216MiB/s-217MiB/s (227MB/s-227MB/s), io=14.2GiB (15.2GB), run=16718-16762msec
iSCSI disable:			READ: bw=889MiB/s (933MB/s), 222MiB/s-223MiB/s (233MB/s-233MB/s), io=13.9GiB (14.9GB), run=15960-15984msec

So, in my scenario, when I have my iSCSI sync = standard, does it mean that sync is off?

I’m afraid I haven’t worked with iSCSI in a long time, so I’m not sure. I thought about it as I was setting up my production system, and then decided NFS was a better choice for me. Here is what I get off of both my systems:

Lab system with 4 drives in the array.

And here is the production array with 8 drives.

I was surprised that the read speed was very similar and that write was pretty fast for what it is. Both arrays are using NFS, no sync, and connected to the VM by a 10gbps connection (fiber on lab and DAC on production). Both from a Windows Server host.

@fred974 iSCSI sync by default is off, so basically yes is my understanding.

However, there is some difference in your test between sync standard and sync disable, which I am thinking is due to the test your running. Depending on how the data is being written in the test, it might be only forcing the sync on SOME of the writes, instead of ALL… where as from a hypervisor, ALL writes will be sent sync for data protection purposes.

As for low cost SLOG devices…
Read this webpage:

Then I would go and bay the different prices, you can also check out r/homelabsales as there are often good deals on there, and the community is pretty sweet as well (r/homelab) as well, sales is more for well… sales… :smiley: