Suggestion for Homelab - SuperMicro, Infiniband, Freenas/Ceph/openzfs and back upto offsite large LTO Tape library

Hi all

I want to set up a large cluster (HPC) with infiniband at 40GB QSFP interconnect and large 380TB storage.

I have the following hardware.
1. supermicro 2U, 4 node rack server. Each node with 2x Xeon E5-2670 and 128GB RAM with each node having x2 QSFP ports.
2. Intel True scale 12200-18 port, 40GB switch.
3. QSFP 40gb DAC (1 metere) x12.
4. Another Intel 1U, 2 node server. Each node with 2x Xeon E5-2620 and 128GB RAM attached to 2x Netapp DS4246 diskshelves. Each diskshelve has 24 drive bays loaded with 24x 8TB WD drives and IOM6 modules to connect to the intel server which hosts the LSI 9300-8e 12Gbps 8 Port SAS SGL PCI-E Host Bus Adaptor.(this will act as a storage sever).
5. Dell 124T PowerVault LTO5 library
6. Netgear Prosafe 24-port Gigabit switch
7. I want to implement the entire VM lab to operate over the IB network, using about a dozen or so VLANs to split all the segments up. Gbit network is just there to get traffic from the lab out to the web and back in.
8. I want the storage subsystem to operate in iSCSI mode, active-active failover to each IB link.
I am not sure XCP-NG and freenas will be able to handle all this.
I am thinking of installing Qlustar on all nodes and try it out.
Tom please comment.
Many thanks

Sounds fun, but FreeNAS does not have any built in support for Ceph so you will have to go with a different solution for that.

@nmal55, In regards to Ceph, many people use Proxmox as their HCI software solution. RedHat RHHI is really solid too, but Gluster is used in that case. I have personally deployed Ceph natively on RHEL and it wasn’t the easiest thing, but documentation is much better now than 5 years ago.

However, in your case it looks like you have a centralized storage design as opposed to a distributed design. With your current hardware layout, I would look into deploying FreeNAS, and if possible, in a dual controller design. I know TrueNAS supports it, but I haven’t seen it with FreeNAS. Might be one of those short comings compared to commercial options such as NetApp or EMC.

Also, for a dual controller setup, I’m pretty sure you would need SAS drives so you would have to make sure that is what you are running.

If you are able to only run it on one node, it would still be a good lab setup. You could setup MPIO iSCSI to your XCP-NG hosts and maybe add some SSD cache for performance.

I am not aware of any company that designs dual controller setups for FreeNAS, but TrueNAS is FreeNAS running on very specific IXsystems hardware designed by them, including the dual controller offerings and a support package.

I am a bit lost here… Fred and Tom… many thanks for your suggestions…

I do have a IBM flashsystem 820 with ~4 TB of flash modules (8x 500GB, these can be set to raid 5) with 2x 8 gb fibre channel I/O.

Just wonder I were to use freenas…on a single node… can I use this flash system for caching infront of the large spinning rust…

The practical implementation of this would be to use a large database… That will be accessed frequently by multiple users…

Many thanks

You can’t use it as a cache for the spinning disks, but you could have two SANs and mount LUNs from each to your virtual hosts. Just make sure you put the virtual disks on the correct datastores/LUNS.

Tom …

How’s Infiniband support in Freenas?

I know you are playing with the the higher end truenas system …I suppose it supports QDR/FDR on it…any implementation issues…?

I know 40GB switches are cheap…I bought 2x 18 port mine on eBay for 40 pounds each. They are both Intel branded Mellonox switches.(12200-18 model)

Any videos…on that please …more into the technical aspect…just like your other videos diving into the details of hardware/ drivers/ utilities/ speed testing… I know that storage disks will be the bottleneck…but with pcie nvme caching drives…you should see some performance boost.

Guys at ixystem should be able send you some gear to test it out …to see the ture potential of their hardware…

I know the redundant motherboard architecture and the back plane that they connect to must have some sort of infiniband interrconnect for instant fail over and may be some sort of heartbeat connection between the two motherboards.

Many thanks in advance.

I reviewed their hardware here with the 40GB connection

Hi Goodmoring from UK…

wow… that’s like 0640 am this morning …you are at work already…Tom

like the work ethics… thanks for the video.

will review it later… busy here at hospital…I am a doc…

just wanted some specific details on the hardware raid cards and the 40GBE cards (? are they ethernet or infiniband) that have been used in that M50 device. it doesn’t give details of those on their website. and how does it connect to the cluster that have 40 GBE interfaces …what 40gbe switches to use and so on…

thanks again.

Cisco N9Ks are good and pretty cheap 40Gbit switches. Make sure whatever you buy they have large buffers.

You don’t have the hardware to support a proper Ceph build as you only have one storage node.

hi sd…

whats the best minimum requirement for ceph architecture…? (atleast x3 storage nodes…?)

I know that it scales well horizontally and a good example of distributed storage… as for as I can understand freenas has no distributed architecture.

Truecommnd from IXsystems mentions about distributed storage but it has no similar architeture like block, object, file storage…

I just installed Truecommand as a container with their latest version 1.2…connected x3 freenas nodes … it still needs a lot more work to get the distributed storage to work.

Truecommand dashboard appears very pretty and I have zerotier installed on the freenas nodes that are geographically in separate locations. cool with a lot of potential but needs a lot of polishing. I will file some bug reports…

I know Tom has said he will be looking at Truecommand… lets wait for his video.

1 Like

If you are going to do Ceph right, you should have 6 nodes and a separate admin system for Calamari. Three of the systems will run monitoring and the other three will run OSD. As you want to add storage you just add OSD nodes.

In a lab environment though, you should be able to run it on one system or better yet, virtualize the above design.

thanks Fred, i will try it …i should be able to set up some lab stuff