ZFS pool import fails with Kernel panic

Hello gentlemen,

My TrueNas Core crashed and can’t boot up anymore. It just loops in kernel panic mode then reboots while trying to import the only pool I have.
The root cause seems to be the PCI card following this bug thread (I have the same log messages and the same PCI card) but I could not find a way to fix the damage that have been done though.

I’m planning to install TrueNas scale on a spare drive and try to import the pool there due to the difference between BSD and Linux, and some feedback on the TrueNas forum. Do you think it’s a good idea or do you recommend something else ?

Below are the logs before the kernel panic :

And the Kernel panic itself :
Since I cannot put 2 media (new user) the kernel panic hangs at fork_trampoline() function and is extremely similar to the one attached on the freebsd link above.

Thank you,

Not an issues I have encountered, hope you have a backup. I would try it first with the same version of TrueNAS it was last working with so you are not adding more variables to the problem.

Hello gentlemen,

Some update on this issue soon-to-be-resolved.

The system crashed with the following message : panic: VERIFY(ddt_object_update(ddt, ntype, class, dde, tx) == 0) failed

The hint in this message is “DDT” which is the DeDuplication Table used by TrueNas and stored in RAM. It happens that I have Dedup enabled on one dataset for testing purpose.
Getting rid of this dataset (copy all data to a dedup-disabled-dataset and remove the dedup-enabled-dataset) did not purge the DDT table which was approximately 512GB in size.

It seems there is no way to flush this table manually. It is supposed to be flushed automatically when the dedup data is not referenced anymore, but in my case it wasn’t, and the DDT table was 535GB in size (you can check the table size with zdb -eDD #YourPool).
The only way to get rid of this table is to delete the entire pool.

So, the fix in my case was to mount the pool as read-only (zfs import -f -o readonly=on Storage) and copy everything from this RO pool to another TrueNas instance.
Then I’ll have to delete the entire old pool and recreate it, and copy everything again.

For now I can’t tell if some data is corrupted, I’ve checked some files manually which were ok, but I cannot guarantee 0 corruption though.

My advise : If you want to test dedup, do it on a dedicated pool and not only a dedicated dataset. As we can see here, the DDT is “linked” to the pool and not the dataset, so an error in one dataset can crash your entire pool.

@LTS_Tom : a video on dedup maybe ? :wink:

I have tested dedup, but never broke it in that way.