I have a pair of ancient Dell PowerEdge SC440 (the Xeon ones) towers that I have been using for my very experimental absolutely non-critical homelab. Mostly to dip my toes into getting stuff done on actual hardware as I do not work or have any professional training in the IT field. They have been running XCP-ng 8 for a few months, and I’ve been playing around with VM’s and some Docker stuff on Fedora CoreOS.
One of these has been failing to boot and detect RAM lately and I was wondering if slapping some newer motherboards, processors and memory is actually worth it or if I should simply let them go and try to save up for something new. I have not opened it yet to look at the caps but I’m guessing it’s not good.
I am hoping to actually have a more decent homelab to run some more services I can depend on, ideally NextCloud or something like that. I already have an SBC dedicated to HomeAssistant.
A little context:
I got this for free when they were pretty much obsolete. I also spent $0 on some hard drives handed down to me.
I live in the third world, where shipping and customs are too expensive to purchase anything decent off eBay.
Those nice 1U or 2U multicore lotsa RAM servers that sell on eBay for $300-$400 sell for $1000-$1500 locally.
The motherboards in those are Dell specific so you will have to do some work to replace it with a standard one. Try the something such as removing the memory and putting it back in and making sure the there is no dust in the slot.
Yeah, they’ve got that odd shape and the position of the processor is atypical. I was kinda hoping the screws and standoffs would align to one of the ATX form factors but haven’t really checked.
I’ll try to give it a good clean and retry, thanks.
One thing to note, and this may take a while to happen. Xenserver is pushing the use of UEFI and the next XCP-NG may require this as well. There was a warning about this when I upgraded to 8.3 release. When I asked on their forums I was told that they may or may not follow Xenserver on this path.
I would pull the processor and use a magnifier to check for bent pins in the CPU socket. I had a classroom full of Xeon X56xx series computers and these often had trouble detecting the RAM. The fix was pretty much universal and involved pulling the processor and finding the slightly bent contact, carefully nudging it until it looked like the rest around it, and putting the processor back in. It was a real pain and glad those are gone, they took a lot of time to keep running. It seems your servers are similar age so they might have a similar problem, the CPU sockets were made by Intel on mine.
I was told this pin shift was caused by heat cycles from powering the computer off and back on the next day. Leaving them on didn’t fix the issues (but did make the room nice and toasty warm in winter), or even make it happen less often. I warranty serviced a pile of them before I just started fixing them myself to save driving across town twice. It really got bad after about the first 3 years of service.
Can you get AMD 5800h powered mini PC for a reasonable price there? They probably have more processing power and more RAM than your current devices. These are also around that $250 to $350 usd price range brand new on Amazon, cheaper on Ali Express.
This is what I’m looking at when I need to replace my lab in a few years. You would probably be at nearly $500usd after upgrading RAM to maximum and a couple SSD. Hoping that XCP-NG will get a generic ARM processor version released, that might really help people like you who don’t have the reasources that we do here, there are some really powerful Rockchip based single board computers on the market, amount of RAM is really the biggest limiting factor, but they do have it “running” on Raspberry Pi.
I’ll try to crack those suckers open this weekend. I must say that aside from a couple of drives that died, I’ve had zero problems with them so far. I do know they’re archeological curiosities…
Sometimes those mini pc’s appear on and off at stores, as a matter of fact my daily driver at work is a mini Gigabyte i5 with RAM maxed.
First thing i would do is to check the CMOS battery… Dell is a bit picky when the voltage go below 2,6V… or if it was 2.4… anyway… start with the battery.
Second… check the PSU voltage as just to low or just to high voltage and you have a gremlin in the machine… it might work some times and some times not… Like a gremlin in the machinery.
Also check the motherboards capacitors if some of them looks odd in shape from the others… Then it can be a bad cap… (This was more of a problem on the older motherboards)
Third… remove the Ram sticks… and rub the contact surface on the sticks with a bit of dry kitchen paper / toilet paper (if you dont have electronic cleaner (isopropanol)
Paper is a bit “roughly” so it polish the contact surface without damage it.
Then put them back and test to boot again.
Fourth… CPU… i dont believe you might have bend pins unless you have removed the CPU and bent a pin when doing it… but i think you haven’t done that.
But it can be bad connection (really rare thing) …
Remove the CPU cooler, clean the CPU from the cooling paste with some paper…
then loosen the cpu tension bracket (not remove) just loosen the tension (but still let it sit in the socket without remove it) and with a tiny touch with your finger tip just tiny micro wiggle it so the pins just change position on a microscopic level, and get a bit better contact again… then reapply the tension bracket again… new cooler paste and put back the CPU cooler.
I thought the same as #4, until you see it for yourself. There were a bad batch of sockets out there and this really happened on computers that I never opened until the problem started.
May have been a manufacturing error when the processor was first installed, may have been something else, but it took 2 years to start happening.
Yeah i have seen production fault too
but most often, you get the gremlins withing a month or at least the first year…
gremlins is really fun. lol (being ironic)