Well, thats not the email you want to get at the completion of a resilver of new larger drives…
This is quite confusing since I have never had any errors in my array previously, and I run weekly scrubs. I recently added a L2Arc drive and set it to use metadata only, and run this pre-init script: echo 0 > /sys/module/zfs/parameters/l2arc_headroom
I also added a SLOG. Both L2 and SLOG are SAS enterprise SSD’s, used of course. I can’t say they are in perfect shape (they are not reporting any errors), but aren’t SLOG and L2 not pool critical, so even if something with them was wonky, would that result in metadata errors on a scrub/resilver?
Last night I popped 2 new 8 TB WD Reds in (both verified via a bad blocks run, no errors found) to replace 2 of my 4 TB drives (I am going through and replacing drives from 4’s to 8’s), and this is the error I woke up to. Last night prior to the resilver, no errors in zpool status. Upon the resilver, I am seeing:
pool: pergamum
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 4.65T in 09:12:38 with 3 errors on Mon Sep 8 09:50:43 2025
config:
NAME STATE READ WRITE CKSUM
pergamum ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ab0351e8-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 6
670dfb97-13fc-4611-bb0f-6680649d4089 ONLINE 0 0 0
8c42800d-40d9-432f-b918-bd4138714187 ONLINE 0 0 0
6ebdcf54-ac93-11ec-b2a3-279dd0c48793 ONLINE 0 0 6
72baec5e-e358-4bbe-a8b0-dd75494f725d ONLINE 0 0 6
8a6e6dd2-465c-4311-b62e-cce797796faf ONLINE 0 0 12
7a9b8d5e-a28d-11ee-aaf2-0002c95458ac ONLINE 0 0 6
d9238765-4851-48c5-b3cc-1650c8de1364 ONLINE 0 0 0
d3a5a104-011f-4602-ab04-90149d8863e8 ONLINE 0 0 6
b1d949c1-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 6
logs
d4c96b7f-9ca8-46ab-836a-ca387309ac56 ONLINE 0 0 0
cache
8e380a80-b813-448b-9704-ed5689983c76 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<metadata>:<0x1b>
At this point, I am not entirley sure what to do. I run ECC RAM, LSI SAS9305-16I, in a HL15 case (so the drives are all plugged into a backplane, SAS cables have not been physically touched or adjusted in months).
Any thoughts on how I should proceed? At this point I think the best course of action is to shut down the system, but I really don’t want to do anything that could result in further damange.