Help! TrueNAS Kernel Panic with VERIFY3 error

Paul_in_IT · January 12, 2025, 6:37pm

Hi all,

Thank you to anyone who responds and takes time to read my post. I have a TrueNAS Scale system that I’ve been running for nearly 3 years with no issues at all. First a quick breakdown of my system:

ASRock X570M Pro4 M-ATX motherboard
Ryzen 9 3950X (in ECO Mode)
64 GB of Kingston ECC RAM
6 8TB HGST Ultrastar He8 (HUH728080ALE601) drives for storage
2 Intel 240 GB SSD for boot
2 Intel P1600X 118 GB Optane Drives for SLOG
TrueNAS Scale 24.10.1
1 Pool with 3 mirrored VDEVs

About 7 or 8 days ago, one of my hard drives in one of the mirrored VDEVs went down, all the other drives were reporting back as good, so I decided to offline the drive and remove the VDEV entirely since my overall pool has enough storage space to absorb the data. I reassigned the now extra drive as a spare vdev for the pool. Everything seemed to run fine and I did a SMART test to make sure all the other drives were good, and a scrub to make sure the pool was still healthy - all checks came back with no errors. Then last night I was trying to access my SMB share and I noticed that it was inaccessible, so I tried to log in to the TrueNAS server and realized it was completely frozen. This is the first time this has ever happened, so I rebooted it and when it was starting it went directly into a Kernel Panic. I’ve tried to do research on what may be causing this issue but I haven’t been able to find much aside from one issue that seemed unrelated. So far the only troubleshooting I could think of is running a MEMTEST86 test to make sure the memory is good, but aside from that I don’t know what else to do. Any and all help would be greatly appreciated as this is my production system.

xMAXIMUSx · January 12, 2025, 7:24pm

If the error is talking about syncing you might try to disable the optane drives to see if that resolves the issue.

Paul_in_IT · January 13, 2025, 12:10am

Thanks for the tip! How would I do that if the system goes into Kernel Panic immediately after GRUB? Would I make a modification in GRUB or try to boot it with all the drives removed?

xMAXIMUSx · January 13, 2025, 12:30am

You could remove the optane physically.

Paul_in_IT · January 13, 2025, 7:37pm

So I was able to mount my pools using Ubuntu Live boot, and everything looks good. I ran a scrub on my boot and storage pool and no errors were found. Tried to reboot and it failed with the same error from the original screenshot.

The issue seems to be related to the indirect-0 device, which from everything I found online says that it is a “ghost” device that ZFS creates as a pointer for the VDEV that was removed. The reason I think they’re connected is because the error states “PANIC at vdev_indirect_mapping.c:528”, however, I have no clue where to go now, especially since everything is my pool is showing as healthy.

xMAXIMUSx · January 13, 2025, 10:20pm

I’d think you would have better luck on the truenas forums for this. You might have come across a rare bug that needs attention.

Paul_in_IT · January 13, 2025, 10:33pm

I know I’m actually on the TrueNAS forums as well, and I found a bug an open bug on the OpenZFS GitHub that I replied to as well. I’m just hoping that between this community and the TrueNAS forum someone can help me. This is certainly a crappy situation to be in. Thanks for the help and taking the time to reply though, I do appreciate it!