Zfs list shows dataset, but folder does not exist

dn512214 · March 18, 2024, 7:49pm

TLDR: I can’t destroy a dataset. zfs list shows the dataset, but the folder does not exist:

root@truenasbak1[/]# zfs list | grep -F docker/docker
bak1/DATPool1/shares/docker/dockerp1                                                         688M  18.1T   188K  /mnt/bak1/DATPool1/shares/docker/dockerp1
bak1/DATPool1/shares/docker/dockertnp1                                                      47.1G  18.1T  29.4G  /mnt/bak1/DATPool1/shares/docker/dockertnp1
root@truenasbak1[/]# cd /mnt/bak1/DATPool1/shares/docker
root@truenasbak1[/mnt/bak1/DATPool1/shares/docker]# ls -lsa
total 26
9 drwxr-xr-x 3 root root 3 Dec  2 18:38 .
9 drwxr-xr-x 6 root root 6 Mar  8 15:31 ..
9 drwxr-xr-x 7 root root 7 Jan 12 00:04 dockertnp1
root@truenasbak1[/mnt/bak1/DATPool1/shares/docker]#

Long Story:

I have two truenas boxes: a primary and a backup. The pool in question is replicated to the backup truenas server hourly. I removed one of the datasets on the primary server because it’s no longer needed, and then left it for a while assuming the backup would get synced up. A day or so later, I realized the dataset was still in the backup server/pool, and noticed I was getting an error about unable to read permissions from the “Datasets” UI. So, I went ahead and tried to remove it. When I did so, the backup server immediately hard-rebooted. I tried this two more times with the same result.

After doing some more digging, I discovered that zfs list is reporting the dataset present, but that is you ls the folder, it does not exist according to linux.

And idea what went wrong, and how do I recover from this? I suppose I could just wipe out the entire pool and recreate it using replication as before (if I don’t run into the same issue), but I’d like to learn what went wrong and how to recover from it the right way (if there is one), and it will take a long time to re-create a 7 TB pool from scratch.

Thanks in Advance!

LTS_Tom · March 19, 2024, 2:26am

I am assuming it’s this bug here:

github.com/openzfs/zfs

Fix raw receive with different indirect block size.

openzfs:master ← amotin:ibs_receive

opened 01:51AM - 08 Jul 23 UTC

amotin

+28 -25

Unlike regular receive, raw receive require destination to have the same block s…tructure as the source. In case of dnode reclaim this triggers two special cases, requiring special handling: - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz() should not dirty the data buffer if block size does not change, or durign receive dbuf_dirty_lightweight() will trigger assertion. - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz() would fail and receive_object() would trigger assertion, so we should destroy and recreate the dnode from scratch. ### How Has This Been Tested? Without the patch raw receive triggers one or another assertion depending on file size if default_ibs is different between source and destination. Neither happens with the patch applied. ### Types of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Performance enhancement (non-breaking change which improves efficiency) - [ ] Code cleanup (non-breaking change which makes code smaller or more readable) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv) - [ ] Documentation (a change to man pages or other documentation) ### Checklist: - [x] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions). - [ ] I have updated the documentation accordingly. - [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md). - [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes. - [ ] I have run the ZFS Test Suite with this change applied. - [x] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).

Which has since been fixed but I am not aware of any way that it can be fixed in that pool other that rebuilding the pool. I had this happen to one of my systems which lead me to shuffling around about 30TB of data so I could rebuild that pool.

dn512214 · March 19, 2024, 3:57pm

Thanks for the insight! I’ll read through and dig into it after work today.

I posted the same on r/zfs, and someone mentioned to check if the datastore was mounted or not, and it appears it is not:

root@truenasbak1[~]# zfs get mounted bak1/DATPool1/shares/docker/dockerp1
NAME                                  PROPERTY  VALUE    SOURCE
bak1/DATPool1/shares/docker/dockerp1  mounted   no       -
root@truenasbak1[~]#

But I have not yet tried to mount it and then destroy the dataset. I’ll try that tonight and if that doesn’t work, blow away the pool.

Thanks again!