I think this is my second post.
I am curious what others are doing to monitor their networks and/or unifi specifically. I’ve started looking into Zabbix over the past week, but I’m not up-to-speed with it just yet. One of the things that I really liked about Unifi and the Unifi Controller software is the fact that it can alert you if a device is down. However, in actual use what I’m finding is that many of the devices on my controller will randomly report as being offline; and when I check their status in the controller they are online.
As you can image this produces a lot of noise as it is all email. I had at one point turned on the alerts to indicate that the device was connected, and that just doubled the amount of emails and noise in my inbox.
So I’ve resolved to the fact that this is not an actual “disconnect”. In about 99% of my cases, the problem is really just a “heartbeat” missed, or single call to the controller that is not being recorded.
The reason I’m posting this is that over the past 24-48 hours I have noticed a lot more disconnect notices than usual. It can be an AP, it could be a switch, it could be a router/gateway. In every instance when I check the actual status in the controller there is no problem the events just say the devices were connected.
Currently I am hosting the following, Unifi controller 6.0.43 at Vultr running on Ubuntu 16.04 with 2 virtual cores and 4 GB of RAM. There are 95 devices added to it across about a 32 sites. I expect to add another 30-40 devices to one of the sites this month.
I have seen this How to Tune article (UniFi - How to Tune the Controller for High Number of UniFi Devices – Ubiquiti Support and Help Center) and I’ve checked my system.properties file for the recommendations made there for heartbeat missed.
My settings are as follow.
When I look at the CPU, memory, and even disk usage on the controller it all looks pretty low and reasonable. A few spikes at times but nothing that would cause me alarm. I am however considering increasing the plan to one with 4 virtual cores and 8 GB of RAM when I bring on the additional devices.
Each site is different, some are on a static IP using fibre to the business type connection, some are cable with static IPs, others are fibre and cable with dynamic IPs there is no common connection type when the devices disconnect/connect.
Should I adjust the inform settings noted above?
Does anyone have a recommendation of what settings to set?
Or should I walk down another path for monitoring the devices for disconnection? I mean the main thing I want to know is if a device is actually offline. If it misses a ping for a moment that’s no big deal as long as it is still functioning.
Thanks in advance for any recommendations!