Unifi controller not readopting most devices from local site, but remotes are fine

Ive been self hosting my controller for a couple years now, and have several (5) remote sites in addition to my local site. a couple months ago, my local site’s devices got into a adopt cycle that has been nearly continuous. i have had several updates in addition to the log4j updates, and i have completely uninstalled and reinstalled the controller, restoring settings only from backup. i also updated java. ive power cycled devices, and that gets them to connect and provision for a couple min before loosing them, just long enough to start them updating to current version.

Ive been making due, but now i need to make some changes and am unable to do so. ive uploaded a video to youtube incase anyone wants to see it in action: UNIFI CONTROLLER ADOPTION ISSUE - YouTube

system specs:
VM win 10 1901
controller 6.5.55

any thoughts as to what i need to do to fix it?

When I read that this problem only happens with local devices, but not with remote ones, my first thought was that there is an issue with reaching the inform address from inside the network. I’m not convinced that this is it, though.

Are you using the same url for the inform address for internal and external devices? If so, do you have split DNS or NAT reflection set up?

it would prob be considered split dns. the inform address is the same, but there are no tunnels for the remotes to talk back, just firewall/routing rules and public name resolution. local is a AD DNS with a static entry for the controller. the remote sites are separate sites in the controller, not remote devices for the same site.

Edit: the controller and the local devices are all on the same subnet, so that i wouldnt have to deal with yet more rules

Can you ping the inform address from that subnet?

yes, i can ping the FQDN and it correctly resolves the private ip address, and i get good responses from the controllers host.

If you ssh into one of these devices that is having trouble and type “info” do you see the inform address you expect it should have? Have you contacted Unifi support about this? I’ve seen devices stuck in adopting loops before but it was actually fine and some sort of UI bug that went away after we deleted the cookies. I’m not sure that is what is going on here after looking at your YouTube video. Have you tried resetting one of the devices by holding down the button?

ive not contacted unifi support yet, i wanted to try to fix it with community first.
in looking at the info, inform looks correct, but timing out. i ping’d the controller from the device and it replies. I have not tried resetting a device yet. i do have a couple that i can try with out disturbing much.

you deleted cookie in the browser? i dont know that would fix my issue as this issue is there on several computers and the app on my phone, but thats easy enough to try out.

Thanks for the couple of suggestions to try.

Model: USW-Flex
Version: 5.76.7.13442
MAC Address: f4:92:bf:a6:cb:c3
IP Address: 192.168.10.14
Hostname: Porchflex
Uptime: 3323253 seconds

Status: Timeout (http:// unifi .xxxxxx .zzzzzz. net:8080/inform)
Porchflex-US.5.76.7# ping 192.168.10.55

Porchflex-US.5.76.7# ping unifi .xxxxxx .zzzzzz. net
PING unifi. xxxxxx .zzzzzz. net (192.168.10.55): 56 data bytes
64 bytes from 192.168.10.55: seq=0 ttl=128 time=0.871 ms
64 bytes from 192.168.10.55: seq=1 ttl=128 time=0.901 ms
64 bytes from 192.168.10.55: seq=2 ttl=128 time=0.795 ms
^C

**spaces in URLs are to break links in this post

With all do respect to the community, I feel that the manufacturer is always the first stop for support even though Unifi support isn’t always the best. When it’s a “send and hope you hear back” form you can always do both at the same time.

Because the device itself is saying that it is timing out this does not appear to be something that the cookie trick would fix. In that scenario the device kept readopting, but only in the web browser, not on the device.

I would reboot an AP and if that doesn’t fix it reset an AP (you can do this via ssh using the set-default command) and if that doesn’t fix it then you are in trouble, haha. I would next reboot the controller for good measure, then make sure all of your updates are done (controller, firmware of each device, network version, etc), and think about rebuilding from scratch, not restoring a backup, on a different platform (such as a cloud key) and migrating everything over.

ive rebooted several devices. They all connected, for between 2 min and an hour. Now they are cycling again. so i defaulted an AP, forgot it in the controller, and re adopted it. it stayed connected for about 2 hours before going back to cycling. its info status states unknown. I then also restarted the VM that the controller is on, so that everything restarts. all other sites have fully reconnected, but the devices on the local site have not.

What would be very interesting now is to determine whether these problematic APs behave the same when adopted in another site and connected to the controller over internet.

Yeah, I am thinking this may well be an issue with the system the controller is running on. What do the firewall rules look like? Is there any sort of AV running?

If they were fine for an hour, and everything Unifi is up to date and you didn’t change anything, then they should be fine for a day.

that is an interesting thought. i might have to try that.
as for firewall on the vm, there is allow rules for the exe and the ports. i even tried for a couple days turning the firewall off. ive also put floating rules in the network firewall pointing to the ip of the controller/ports just incase traffic was somehow on a different vlan despite the correct ip address. The AV is just Defender. however if it was AV causing the issue, would it not also interfere with the remote sites too?

The AV was a bit of a stretch and it sounds like your firewall rules should work. Did you contact Unifi support yet?

I would next ssh into one of the devices and set the inform again. Unifi says that in some cases you have to set the inform twice. If there are minor issues the second inform forces it to provision. I read on the internet (which is never wrong) that if the DNS IP isn’t specifically programmed into each device then there could be trouble too, like if you let them use DHCP. I always set static IPs on my network infrastructure devices so I have never not done this. This article is about remote devices, the second inform is supposed to happen in the background for local devices but I understand that sometimes it doesnt work. https://help.ui.com/hc/en-us/articles/204909754-UniFi-Device-Adoption-Methods-for-Remote-UniFi-Controllers