TrueNAS Scale - Miserable NVME Speeds

Hi,

I’ve recently started experimenting with TureNAS Scale thanks to Tom’s videos (only used Synology before) and I recently got a pair of R730XDs with Dual CPUs, 128GB of RAM and a 10G interface. I’ve set it up with 10x 4TB SAS drives and 4 2TB Samsung 980 Pro NVME SSDs. Initially, I set up the HDD pool as a single RAIDZ2 VDEV with an iSCSI share to XCP-NG over a dedicated 10G interface. I did the same with the NVME pool. I was fairly happy with the result of the speed test of CrystalDiskMark on a Windows VM but I’m not sure if it should be faster. The NVME iSCSI was a different story, especially the 4k reads and writes which were barely faster than the HDD pool. I thought this might be something to do with the iSCSI blocksize setting on TrueNAS but XCP-NG refused to work with other sizes and NFS was even worse for 4k writes. I then tried various combinations of Striped NVMEs with log and without but it was all broadly the same. I then thought it might be something with the network interface so I set up to VMs directly on TrueNAS. The reads and writes were much faster but the 4k random was still awful. I even tried booting straight into Windows and reformatting the SSDs to make sure that there was nothing wrong with them as they were second-hand. I got the full speed out of them as expected with about 70MB/s 4k random. The NVME’s are all on the same x16 slot from one CPU via an Asus HYPER M.2 X16 PCIe adapter.

  1. HDD pool result over 10G iSCSI - RAIDZ2, 1 VDEV
  2. HDD pool result with VM directly on TrueNAS - RAIDZ2, 1 VDEV
  3. NVME pool result with VM directly on TrueNAS - Stripe 1 VDEV. The result is for one NVME SSD but the stripe for 4 NVMEs in one VDEV was not much different on 4k and about double for the rest

This is very puzzling to me and I’m wondering if there is something I’m missing here or is ZFS not good at random IO? I could not really find anything useful online and I’m getting tired of destroying and rebuilding different pools!

I’m purely experimenting at this stage so please don’t comment on the setup issues such as striped VDEVs etc.! And I’m not looking to get bleeding-edge performance either so I don’t think it matters much that I’m using CrystalDiskMark through Windows.

Thanks in advance.

I have not had good luck getting performance out of those in a Dell server. I did not really dig deep into the problem but moving that same card to standard AMD system that supported the bifurcation was MUCH faster.

When setting up NFS or iSCSI in ZFS without a SLOG make sure you have SYNC=OFF on the datasets.

Thanks, that’s led me down a new avenue/angle to research and I’m finding more promising results on Google now. I was researching this form a zfs angle before as I was getting normal speeds without.

So I spent many moons trying to figure this out further including trying TrueNAS core, and messing about with NUMA settings in the BIOS but with no result. However, out of desperation, I tried to learn how to use FIO and assuming I have run realistic tests (I’m still not familiar with what all the arguments do), I got excellent results on the NVME striped pool directly on TrueNAS. I just can’t replicate them either on iSCSI, NFS or a VM directly in TrueNAS… I would have taken a guess that this a networking/NIC issue if wasn’t for the VM results directly on TrueNAS. Either that or Windows… I tried FIO in Debian and Windows over iSCSI and the random results were better but than CrystalDiskMark but the sequential was more or less the same. I got 50 - 60 MB/s (10 - 18K IOPS) on Windows and 30 - 40 MB/s (8 - 10K IOPS) on Debian. Both use the same FIO arguments as the results below are used directly on TrueNAS.

The only thing I can think of now is that this is to do with the Dual CPUs on the server. I’m hoping take one out and see if it makes any difference.

TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=16
...
fio-3.28
Starting 16 processes

TEST1: (groupid=0, jobs=16): err= 0: pid=6745: Tue Feb 13 05:28:10 2024
  read: IOPS=96.9k, BW=11.8GiB/s (12.7GB/s)(1419GiB/120001msec)
    clat (usec): min=11, max=46859, avg=66.13, stdev=195.72
     lat (usec): min=11, max=46859, avg=66.25, stdev=195.76
    clat percentiles (usec):
     |  1.00th=[   19],  5.00th=[   28], 10.00th=[   33], 20.00th=[   36],
     | 30.00th=[   38], 40.00th=[   41], 50.00th=[   45], 60.00th=[   49],
     | 70.00th=[   56], 80.00th=[   67], 90.00th=[   91], 95.00th=[  125],
     | 99.00th=[  412], 99.50th=[  807], 99.90th=[ 2147], 99.95th=[ 3064],
     | 99.99th=[ 7439]
   bw (  MiB/s): min= 8517, max=17934, per=100.00%, avg=12124.46, stdev=97.73, samples=3808
   iops        : min=68134, max=143469, avg=96988.62, stdev=781.85, samples=3808
  write: IOPS=97.0k, BW=11.8GiB/s (12.7GB/s)(1421GiB/120001msec); 0 zone resets
    clat (usec): min=13, max=50678, avg=93.87, stdev=230.79
     lat (usec): min=14, max=50680, avg=96.42, stdev=231.31
    clat percentiles (usec):
     |  1.00th=[   26],  5.00th=[   41], 10.00th=[   48], 20.00th=[   51],
     | 30.00th=[   55], 40.00th=[   58], 50.00th=[   63], 60.00th=[   69],
     | 70.00th=[   77], 80.00th=[   89], 90.00th=[  123], 95.00th=[  192],
     | 99.00th=[  750], 99.50th=[ 1172], 99.90th=[ 2606], 99.95th=[ 3654],
     | 99.99th=[ 8455]
   bw (  MiB/s): min= 8521, max=17901, per=100.00%, avg=12139.76, stdev=97.40, samples=3808
   iops        : min=68169, max=143203, avg=97111.14, stdev=779.19, samples=3808
  lat (usec)   : 20=1.00%, 50=38.25%, 100=49.08%, 250=9.19%, 500=1.32%
  lat (usec)   : 750=0.39%, 1000=0.22%
  lat (msec)   : 2=0.41%, 4=0.10%, 10=0.03%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=3.71%, sys=67.74%, ctx=5972339, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=11625618,11640426,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=11.8GiB/s (12.7GB/s), 11.8GiB/s-11.8GiB/s (12.7GB/s-12.7GB/s), io=1419GiB (1524GB), run=120001-120001msec
  WRITE: bw=11.8GiB/s (12.7GB/s), 11.8GiB/s-11.8GiB/s (12.7GB/s-12.7GB/s), io=1421GiB (1526GB), run=120001-120001msec

Sequential? results directly on the TrueNAS machine (100K - 120K IOPS)

TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.28
Starting 16 processes

TEST2: (groupid=0, jobs=16): err= 0: pid=6916: Tue Feb 13 05:42:15 2024
  read: IOPS=66.8k, BW=261MiB/s (274MB/s)(30.6GiB/120001msec)
    clat (usec): min=2, max=75315, avg=57.21, stdev=87.24
     lat (usec): min=2, max=75315, avg=57.37, stdev=87.26
    clat percentiles (usec):
     |  1.00th=[    6],  5.00th=[    8], 10.00th=[    9], 20.00th=[   10],
     | 30.00th=[   12], 40.00th=[   14], 50.00th=[   57], 60.00th=[   70],
     | 70.00th=[   92], 80.00th=[  102], 90.00th=[  119], 95.00th=[  135],
     | 99.00th=[  208], 99.50th=[  262], 99.90th=[  449], 99.95th=[  562],
     | 99.99th=[  840]
   bw (  KiB/s): min=204509, max=301640, per=100.00%, avg=267390.06, stdev=646.86, samples=3824
   iops        : min=51123, max=75402, avg=66841.87, stdev=161.75, samples=3824
  write: IOPS=66.8k, BW=261MiB/s (274MB/s)(30.6GiB/120001msec); 0 zone resets
    clat (usec): min=8, max=75312, avg=177.95, stdev=159.42
     lat (usec): min=8, max=75313, avg=178.16, stdev=159.42
    clat percentiles (usec):
     |  1.00th=[   57],  5.00th=[   71], 10.00th=[   77], 20.00th=[   92],
     | 30.00th=[  111], 40.00th=[  122], 50.00th=[  141], 60.00th=[  165],
     | 70.00th=[  194], 80.00th=[  235], 90.00th=[  310], 95.00th=[  408],
     | 99.00th=[  717], 99.50th=[  791], 99.90th=[  922], 99.95th=[ 1123],
     | 99.99th=[ 1418]
   bw (  KiB/s): min=206722, max=297379, per=100.00%, avg=267455.24, stdev=544.66, samples=3824
   iops        : min=51676, max=74342, avg=66857.65, stdev=136.21, samples=3824
  lat (usec)   : 4=0.05%, 10=10.79%, 20=11.94%, 50=1.52%, 100=26.04%
  lat (usec)   : 250=40.69%, 500=7.40%, 750=1.17%, 1000=0.35%
  lat (msec)   : 2=0.03%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=2.48%, sys=50.08%, ctx=7819836, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=8014925,8016768,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=261MiB/s (274MB/s), 261MiB/s-261MiB/s (274MB/s-274MB/s), io=30.6GiB (32.8GB), run=120001-120001msec
  WRITE: bw=261MiB/s (274MB/s), 261MiB/s-261MiB/s (274MB/s-274MB/s), io=30.6GiB (32.8GB), run=120001-120001msec

4k random results directly on TrueNAS (66k IOPS)