I was thinking about the SMR drive woes and wondering if they could explain an issue I had with an HDD backup system where the write speed unaccountably fell drastically somewhere after 200 GB written.
Then it occurred to me that SSD drives have a similar architecture to SMR drives, with a small fast cache and a dog-slow main storage area.
Doesn’t this mean that no SSD is suitable for RAID, just as no SMR drive is?
I would need to know the model of the drive you are talking about.
SSDs are fine, the SMR issues relates to mechnical drives due to physical limitations not present with solid state media, though similarly keeping the drive capacity below 75% on SSDs is recommended for performance reasons.
My understanding is that all SSDs except a few very expensive enterprise models have a combination of dog-slow MLC flash, SLC flash cache, and DDR cache. So their write speeds will fall to some fraction of their rated speeds once the SLC cache fills – which is highly likely during resilvering.
Just an example:
I imagine the 80 MB/s MLC speed is set by the flash not the controller, so presumably an NVMe drive will be just as slow once its cache is full – ie perhaps 30 times slower than its normal speed.
You’re mixing many things together and the idea you have that SSDs are all bad for RAID is wrong. Its not just “a few very expensive enterprise models” which have good sustained write speeds.
Yes MLC literally means Multi Level Cell, but it is actually only (always) two bits per cell. TLC means three buts per cell (Triple), and QLC means 4 bits per cell (Quad). The more bits per cell, the longer it takes to program each cell because the voltage has to be tuned to exactly the right level. One trick drive makers use is called Psuedo-SLC, where writes are initially performed using just one bit per cell, and a background process copies the data at full bit depth as soon as it can. The size of the Psuedo-SLC cache is variable, up to 50% of the drive (as opposed to a real SLC cache which is fixed in size and very small) which is one reason why no more than 75-80% of an SSD should be filled. This is also why SSDs NEED to have TRIM - the controller needs to know what it can overwrite as it shuffles the existing and new data around.
That Samsung QVO drive you quoted is QLC. It also is NOT intended for anything beyond regular consumer or office worker usage. In those cases, there will rarely be a sustained write long enough to exhaust the Psuedo-SLC cache space, meaning that it will usually be as performant as other drives.
Enterprise drives nowadays, even ones with many gigabit/s of sustained writes, are using TLC. Here’s one from 2017 that uses TLC: https://www.storagereview.com/review/samsung-pm1725-ssd-review . So just the fact that a drive uses MLC or TLC doesn’t mean it will have poor sustained write performance. But you don’t have to go to the very high end enterprise drives to get good sustained writes - middle or high end consumer drives are good performers too. HOWEVER this is also not a given that can be assumed, as most of the currently available PCIe4 NVMe drives are slower than middle/high end PCIe3 drives (such as Samsung 970 Evo) once they exhaust their Psuedo-SLC space.
Without knowing the exact drives you had in your storage array (please tell us), I would suspect you either were using drives that were not suited for sustained writes, and/or that your RAID system doesn’t pass through TRIM to the drives (ZFS on Linux only recently gained this and it isn’t on by default, ZFS on FreeNAS is less clear, nearly all hardware RAID controllers don’t support TRIM, and MDADM software RAID on Linux doesn’t have TRIM enabled by default AFAIK).
TRIM on ZFS was added to FreeBSD back in version 10 and was enabled by default, so the latest versions of FreeNAS do support it and should have it enabled.
Easily checkable from FreeNAS command shell:
edit: just checked on FreeNAS 11.3 and it is enabled by default
To brwainer: what you said seems to confirm what I said. Out of SSD drives installed as RAID, what fraction do you think meet your exacting requirements? How many people are aware of those requirements? How many RAID arrays are more than half full and will during resilvering burn through even pseudo-SLC, which you said is no more than 50% of capacity?
I believe people need to be warned not to use SSDs for RAID, just as we have now decided to warn them about SMR HDDs.
Again you are missing the point and making things too simplistic. Just like HDDs have categories (most clearly shown by WD’s color scheme - Green for low end consumer meant for mostly idle and low power draw, Black for consumer performance, Red for NAS/RAID, Purple for surveillance, Gold for enterprise) SSDs have categories (read/cost optimized consumer, write optimized consumer/prosumer, read optimized enterprise, write optimized enterprise - with enterprise generally but not always meaning full power loss protection). The problem is that other than consumer vs enterprise, the categories aren’t obvious, and also there wasn’t a NAS/RAID prosumer category until recently. NAS/RED SSDs are available from Seagate and WD.
If you really want to go on an education crusade, there isn’t an easy way to say “avoid read optimized SSDs that lack PLP” - not as easy as saying “avoid SMR drives”. But now it is easy to say “buy something labeled NAS or RED, just avoid the RED drives ending in EFAX”.
EDIT: By “easy” I mean giving advice to people who don’t want to do a lot of research to verify whether a drive has good sustained write performance, and what level of power protection.