XCP-NG installation -- machine locks up -- looking for solutions

Hi – sorry to bother everything with this thread however I’m just looking for maybe best practice advice

XCP-7.6 installed on Protectli device (intel i5, 16Gb Ram)
XCP-ng hypervisor running two VMs (pfsense (4Gb RAM dedicated), Ubuntu running XO (4Gb RAM dedicated)).

I’ve had a very mixed experience with this setup. Either the host (xcp-ng) or pfSense VM will completely lockup anywhere from 2-3 times per week. I’ve described this problem numerous times on the xcp-ng forum and really haven’t gotten anywhere. I installed netdata on dom0 and unfortunately this really hasn’t help provide any useful information as to why the device (dom0) or pfsense locks up. If it’s only the VM I can restart the toolstack, however if its the dom0, I have to actually manually power cycle the device to get it to work. Lock ups are intermittent and I don’t know really where to go from here – perhaps its a hardware issue – perhaps not – however I’m not sure how to even debug a hardware issue if there was one present.

I’m contemplating just reinstalling everything from scratch – I like the features of xcp-ng but I’m not certain maybe I should try another hypervisor. I’m using this for home use so I have some flexibility to try things. Perhaps I should just install pfsense directly on the protectli – however given the hardware on the box that is an awful expensive router. Looking for some advice at this juncture.

My guess is some sort of incompatibility with the hardware. Have you checked the hardware compatibility list? I have no experience with those devices, and its the first I’ve heard of XCP-NG being installed on to one.

There wasn’t much hardware to select – I added RAM and disks:
RAM Modules: SD4/16G2133EMB MTA16ATF2G64HZ-2G1B1 16GB SODIMM DDR4 2133MHZ 2RX8
Disks: Samsung SSD 860 EVO mSATA 1TB MZ-M6E1T0BW
Samsung 860 EVO 1TB 2.5 Inch SATA III Internal SSD (MZ-76E1T0B/AM)

In terms of hardware compatibility – Wasn’t much of a list: https://protectli.com/wp-content/uploads/2017/08/FW6A-Datasheet-180815-1.pdf

I remember looking at a bunch of things prior to purchase of these items. My guess if there was any compatibility issue it would be the RAM modules. I’m not sure however how to test this theory. I’m not even sure if a complete reinstall would help.

Not sure if its an option for you, but what about XCP-NG 8?

We were running on XenServer 7.0 for a while, and there was a bug using certain hardware that caused hosts to reboot, after submitting a ticket with Citrix it was addressed in the next version 7.1. Might be worth looking into.

Run a smart report on the SSDs.
I had an issue like that before and it turned out to be a drive gone bad.

Try running memtest on the system, but it sounds like you have a hardware issue.

Smart Tests on both SSDs were Ok, just needed to make a bootable USB for the memtest86.

So I ran memtest86 on the RAM modules - four passes.

6 errors were encoutered on Pass #1, however the remaining 3 passes no errors were reported. I assume however this actually means corrupted RAM?

There could possibly be an issue https://www.memtest86.com/troubleshooting.htm