I have a problem with my freenas server, and hoping the community can help out. Currently my freenas is running on a Dell R720 but have used freenas for several years.
Web Interface - becomes sluggish and/or hangs - When browsing from tab to tab freenas will sometimes get hung or take 30+ seconds to load. There is no specific tab that does this more than others.
Netdata - stops loading counters/hangs - Netdata will show current counters Disk read/write, IPv4 Inbound/Outbound, and CPU for 10’ish seconds. Then some counters will update but the graphs will not move or they all start working again.
SSH - becomes unresponsive after about 10 seconds & finally closes the session.
Fresh Install on new SSD - Originally freenas was installed on Dell’s internal memory cards but thought that might have been the issue so I purchased a brand new SSD and did a new install. I also was on the most up to date version of freenas but downgraded to 11.1 currently because I had used that for years with very little or no issues but this was on different hardware.
I had one memory dimm with an error “Correctable memory error rate exceeded for DIMM_A4” and have replaced it. Went through logs on server, but usually get a quick command off before SSH disconnects.
Memtest86 - Ran for about 20% of the total memory because it takes so long
CPU Load Test - Ran 100% CPU load on server for 30 minutes
192GBs of ECC Memory
Intel Xeon CPU E5-2620
Intel 2P X520/2P I350 rNDC
Any help or ideas would be greatly appreciated.
what version of FreeNAS and have you gone through /var/log to check for errors?
Have you checked the output of /var/log/messages for any anomalies?
If you’re having issues with ssh disconnects use screen or tmux so you can resume the session.
I also usually install nagios across all our BSD machines for monitoring which can help identify issues.
Could also set up ELK stack to redirect logs to for debugging.
FreeBSD is fairly solid on R720.
Thanks for the reply. Currently on version FreeNAS-11.1-RELEASE. I was on 11.2-U4 but was having issues so I downgraded to 11.1. I did review /var/log folder but didn’t see a smoking gun.
Here’s /var/log/messages from this morning.
Currently there is a Service:nas-health Warning message but unable to track that down & it also looks like it could be a bug with 11.1.
Do you have iDRAC set up to allow remote console redirect? This seems like a good use case for running a network or serial console to help debug.
Also, I’d strongly suggest running the entire memtest86 suite multiple times, even if the system is down for a weekend. If you’d like, you can disable to ECC injection tests, I suppose. Sometimes corner case HW fails take a couple runs with memtest to reveal the problem.
Hmm I have a similar issue…I’m running u4. I haven’t had a problem with ssh per se only web gui. If I haven’t logged into web gui for a couple of days it takes like 30 seconds to come up in browser which is super annoying. After getting logged in however all other pages load with same speed. I’m not sure if I have the same issue as you perhaps. I once had some bad ram in another system and weird things like what you describe would happen from time to time. To debug my problem I smart tested all the drives and then ran mem86 tests which revealed bad ram. I luckily could rma the ram and after getting a couple of good sticks I never had anymore problems. I have 64 gb ecc ram in my freenas and I did memtest when I was installing system. I think it took about 3 days or so.
Yeah, I have iDRAC access and did find a bad memory dimm that was swapped last weekend before running memtest86. Running memtest86 is something I’ll need to do.
Check the basics: An IP conflict is consistent with those two symptoms you’re presenting. I think a faulty hardware bad enough to interfere with a web server or ssh session will interfere also with other FreeNAS activity.