Speed up ZFS scrub

I have a backup system (Raspberry Pi 4) for my home TrueNAS which the latter replicates data to each day. The backup system has a single pool consisting of a single external USB hard drive (no redundancy, but that’s ok for the backup system). I have set the disk to spin down after 10 minutes since it’s only accessed once a day (there are no shares on this system).

Today I noticed that the drive wouldn’t spin down after the replication completed and it immediately spins back up after spinning it down manully. Turns out there is a scrub scheduled for the second sunday of each month. Looking at the output of zpool status, I get this:

scan: scrub in progress since Sun Sep 12 00:24:10 2021
      3.05T scanned at 436K/s, 3.05T issued at 436K/s, 3.17T total
      0B repaired, 95.99% done, no estimated completion time

This is as of now, some 17 hours after the task started. This reveals that the data is checked at a speed of what I presume is 436 KiB/s. That seems rather slow. How can I speed this up? I’d like to keep scrubbing the pool on a monthly basis as there is never any data read from it under normal circumstances. I get that for a production server, it makes sense to have the scrub be a very low priority task. But for a server that is purely for backups, that strategy doesn’t make a whole lot of sense. I want the scrub to be done as fast as possible, so the drive can spin down.

Researching this, I found out about zfs_scrub_limit and zfs_top_maxinflight tunables, both of which seem to be deprecated.

1 Like

Never had the problem it does take time. Let the job finish then make whatever adjustments you deem necessary. Walking on the wild side with one drive defeats the whole reason for ZFS. At least do a mirror.

Well, what adjustements to make is what I’m trying to figure out.

And regarding the “no redundancy” matter: This is only a backup of my data. I personally don’t feel it defeats the point of ZFS, since there are other advantages to it like, particularly, snapshots. When you plug in a USB drive locally into a machine to make some backups, I think most people wouldn’t really be concerned with RAID on the backup drive. It’s just like that in my situation, only that I can plug in my USB backup drive from anywhere in the world over the network (it connects to my VPN). Anyways, I understand that there are different philosophies on how to do backups correctly, but that’s not really the topic of my question.

Check CPU / RAM usage as well as disk activity. You’re probably IO bottlenecked somewhere.

My 6TB ZFS volume only takes 00:08:47 (I believe 9 minutes). But that’s running on an E5-2620, 96GB of RAM, and 22x 10K SAS drives with an NVMe log drive.

I manually started a scrub to check the system performance. Before starting the scrub htop showed this:

Afterwards it’s like this:

So there’s around 25% CPU usage by the kernel and memory usage is about 15%. Doesn’t look to me like this is the bottleneck.

However, for some reason the scrub seems to go faster now:

scan: scrub in progress since Tue Sep 14 12:26:56 2021
      484G scanned at 1.41G/s, 55.1G issued at 165M/s, 3.18T total
      0B repaired, 1.69% done, 05:31:17 to go

Maybe it had something to do with the fact that a replication started while the original scrub was going on? This suggests that accessing or writing data to the pool will kill the scrub performance.

Anyways, I’m not gonna pursue this any further, it’s just not worth the time.

1 Like