Maximizing Recording Performance

root_complex · December 26, 2024, 2:47pm

I have a PCIe 5.0 card that’s sending video to the PC’s main memory at approximately 10 Gbytes/s.
I need to record 20TB of this data without loosing a single bit.
All solutions are on the table: using RAID 0 with multiple SSDs / recording raw data without a file system / Linux / Windows / etc…
What would you do to achieve the high throughout requirement ?

LTS_Tom · December 26, 2024, 2:53pm

10 Gbytes/s is equal to 80 GB/s. and 45 Drives has an all flash NAS that can almost do that if you build it out with 100G interfaces. Or maybe put in whatever is capturing the data right into the Stornado server.

root_complex · December 26, 2024, 3:19pm

Thanks for your input.
But I’m trying to build my own custom solution around an HEDT.
Ultimately, it’s a high performance recording device that takes PCIe data and records it - and that’s all it has to do. So I can do without many of the features that regular computer storage has to support.

What I had in mind is 4 NVMe devices (either Gen 4 or 5) in a Linux Software RAID 0 configuration, or maybe going without a filesystem altogether and striping the RAW data between the devices using a user application.

I’m not a Linux / Software expert, so the purpose of this post is to get an idea which approach will yeild the best performance.

LTS_Tom · December 27, 2024, 1:15am

Raid 0 has no fault tolerance so losing one drive loses the whole system. Not something I have ever looked into, but I assume there are motherboard with 4 or more NVME slots.

root_complex · December 27, 2024, 2:22am

From your experience, would writing raw data to disk bring higher performance than doing so with a file system ?

Greg_E · December 27, 2024, 3:57am

You can probably get what you need from 4 nvme pcie 4x4 or pcie 5x4 drives (each drive) in raid 10. More drives = better. That problem you are going to have is one of drive size, you are going to need a small fortune worth of drives to build a big enough pool to handle any realistic amount of recording.

You are also going to need to build it and test it, don’t be surprised when the speeds drop after the initial buffer fills.

Or we are mixing byte for bit, if you are only doing 10 megabit per second (4k “uncompressed”) coming from a camera on ST2110, then more common NAS solutions are fine.

LTS_Tom · December 27, 2024, 3:58am

Not something that I have put any thought into as I am always writing to a file system.

root_complex · December 27, 2024, 5:02am

Not a typo. I wrote and meant 10 Gbytes/s (80 Gbits/s).
It’s a multispectral camera with a very high frame rate.
The solution doesn’t have to be “low cost” - just “cost efficient”. I.E: meet the bandwidth / volume requirement without over engineering.

Greg_E · December 27, 2024, 4:16pm

I had a feeling it was some kind of scientific camera, but always like to be very clear.

I was able to get some impressive results in a benchmark to a single m.2 nvme drive with 4 lanes of PCI3, so 4x4 or 5x4 should be even better. My concern is that every company cheats, they use the buffer speeds and sometimes those buffers are hard to fill in a benchmark. Your use case is going to be filling all drives within about 80% - 90% in anything close to lean costs.

I would suggest a fast processor with LOTS of PCIe lanes. Look at the difference between m.2 and u.2 drives to see if you can find sustained data rates vs buffer data rates. I would say you need 24tb to 32tb raw sizes and you might try variations of raid 10 for testing file systems. So like 8x8TB drives or 10x8TB drives (mirrored pairs with stripe between them). You will want to compared software raid vs hardware raid, sometimes software can be faster. I recently looked at an AMD powered Supermicro server that had up to 14 NVME drives (2 for OS), this might be a starting point.

What OS does the capture card need? EXT or LVM might be choices for file systems under Linux, NTFS is probably the choice for Windows but not sure it will keep up.

root_complex · December 27, 2024, 6:02pm

The capture card needs Linux.
With software RAID the straightforward approach would be to set up the system using some sort of stardard utility such as mdadm and from that point on the application would see a single virtual storage device and handle the data striping behind the scenes. That’s very cool and probably what I would do if I wanted to speed up the storage performance of my desktop.

But I’m thinking, perhaps I can leverage the simplicity of the dataflow in my use case to design something simpler than a fully featured software RAID and
gain performance in the process.

Main points to consider:
the throughput is constant, I don’t need random access to data, I can guarantee I’m starting to write to a completely empty device and every new block of data will be written exactly after the previous ended.

What if the data striping will be handled by the user’s program ? A block of PCIe data (of constant size) will be split into a number of files (equal to the number of physical SSD device) with all files written simultaneously - each to its matching SSD. And everything handled by the user’s program.

Greg_E · December 27, 2024, 10:22pm

You are getting into areas that are far outside the normal IT areas, so I have a feeling there isn’t going to be much more anyone can suggest. The idea of having the capture tool split the data onto storage is definitely a method that could be done, you might even use this to break up the spectra into different drives. A lot of this depends on how the data is handled in the capture card. Breaking it up across the drives might also let you stay closer to cache sizes and the fast speeds, but it still suggests that data rates are high no matter what you do.

And thinking about this, assuming the capture card is only a single slot, you are dealing with 16x lanes for capture, so 16x lanes worth of storage should be close to fast enough, you might prototype this with 4 drives, but for things that can’t be lost, 4 mirrors (or more) are probably best. I don’t think a conventional raid 5+ system is going to be the winner here, but more drives equal faster speeds to a point.

A twist to this might be, can you get a mainboard/processor that supports like 30 TB of RAM? Even DDR4 will be faster than storage and cheaper than DDR5, then bleed that off to fast storage and network out to huge but slower storage. Might be feature creep and budget bloat, but should be a workable solution.