PfSense ZFS faulted corrupted drive

I’m just posting this if by chance if someone runs into the same issue. Keep in mind I’m just a regular residential user and been using pfSense for about five years. I use one of those Qotom devices for my router. I installed it on two 64GB drives with the ZFS mirror option. This particular one I been running for about two years. I planned on using my other Qotom for a minecraft server but never got around to it.

Recently I ran into an issue. I had updated pfBlocker and it does its updating thing. A while later I noticed I couldn’t even get to the login page. It would time out but I was still able to get out on the web. I had no way to properly halt the system even right at the router. I did a hard power cycle and it did a continuous boot loop at the bios screen and showed an error of A2 on the bottom right of the screen. It would literally pop up on the BIOS screen for a second then reboot over and over. It turns out I had a bad drive. Was it the SSD or mSata? It was the mSata that was causing loop. I never saw anything like it and why did it just happen then at a startup?

Luckily the other drive took over once i pulled the faulty one. The drive is under warranty but that’s always a process. So i ordered a new one. It gets here and its time to go to work. I’m not familiar with at zfs at all. I had to go through the browser because when I SSH into the router, certain zpool commands were giving permission errors.
With the old drive removed I do a pool status and come to this page. That link served me no good whatsoever.

I shut it down and installed the new mSata drive. BIOS sees the drive but pfSense doesn’t. After a bit, I fired up PartedMagic and made sure the drive had a partition table. I boot up the router and it now sees the drive at least.

After I searched around to figure out what to do next but wasn’t having much luck. I finally came to this site that helped me get that new drive back online and resilvered. https://farrokhi.net/posts/2020/05/replacing-a-faulty-disk-in-zfs/

Now its time for a drink!

2 Likes

I can answer the why it happened at boot question… Pfsense runs most things in RAM so you might have a completely functioning system, do a reboot, and not get it back. They even specify in the update/upgrade instructions that you should reboot first, just to make sure you don’t have a hardware problem.

Now why the mirror didn’t automatically switch over, I have no idea, that’s what it was supposed to do which is why you configured it that way.

Thanks. I did know it mostly ran in RAM. I have the smart status on my dashboard and I’m in there often enough looking at traffic graphs. I would of assumed it would of issued a fail or caution on the status. Nothing is perfect. I get that for sure.

I had another drive fail but this time the mirror switched over. I’m not sure what’s going on with the failing drives. Maybe many writes? I can’t see overheating being an issue.

Do you do a lot of logging? That might write to the drives enough to make them wear out.

What brand of drives are you using?

I believe the only thing logging are system logs.

KingSpec mSata drive. There aren’t many affordable options for mSata.

How about an m.2 to mSATA adapter? Or an mSATA to 2.5 inch drive adapter?

That said, the KingSpec are probably not terrible when you look at the real chips and controller in use, probably Micron chips if I was to guess.