Unifi Site unable to connect to controller

During this pandemic, we have been selling a lot of Unifi USGs with Unifi APs to home users that run on our controller. Yesterday we installed one of these setups but it won’t connect to the controller over the internet. Over Lan, it works but as soon as we plug it in at the client it won’t show up on the controller. PFsense Logs show the rule passing when I pull up the log for that IP on that port. The only ports we have open on our network are for Unifi 8080 and 3478. I would get my firewall configuration being wrong but it’s only one site failing out of all of them. The Unifi controller log doesn’t have anything either. I have tried all of these but none of them have worked.

Disabling VM Firewall
Rebooting PFsense
Rebooting Unifi VM
Recreating the port forwarding rules

How is the inform ip set in the access point? Is it by domain or ip?

Can you put this ap in a dmz on pfsense just to see if connection is made?

1 Like

@Thedannymullen

I set the inform address as a domain.

Everything is at the clients site and one of the APs is mounted on a very high ceiling. I’m trying to avoid going to their home unless I need to.

I loaded IFtop on to my unifi controller and I can see the connection comming in. My guess is that the unifi controller is having problems reconizing the devices at this site. I’ll try readopting them when I can get to the clients home.

1 Like

Update: I am confused to another level. So I reset and tried to readout one device and it won’t show up on the controller. I took one of the devices with me and plugged it into another internet connection same ISP about 1/2 mile down the road and it works. Now I have to figure out what is happening at this clients site to make this happen.

Is the controller you are accessing located offsite from the clients home?

If so I suspect a firewall or dns issue. Maybe it can’t get through a port or resolve the domain name.

1 Like

@Thedannymullen

The controller is one we host. All of our other sites are working just fine. It is resolving the correct IP and making it through the port. Both our PFsense box logs and IFtop logs on our Unifi VM show the correct IP making it through both firewalls correctly. What really has me lost is we can move the devices for this client to other sites such as my house, other clients, our office, and I even went to the local library to test it and they work every time. The only time they fail to connect is when they are at this client’s site. The only device I reset to try and readopt won’t connect to the controller so I have to get it on another network to adopt it.

It can’t be the devices because they work when moved to another connection.
It isn’t a network connection. I can see the traffic coming into our network so things are not getting blocked.
I don’t think it’s our controller because I have added devices to it since then without issues and none of the other sites have issues like this.

This seems like a bug of sorts but no error shows up in the logs would help explain where I should be looking to track it down.

@BoatYardJunkie can you bypass the firewalls at the client site for one ap. See if it connect properly.

1 Like

@Thedannymullen

If it was the client’s firewall wouldn’t that block it from getting to the controller VM?

The firewall at the client’s is a USG anyways so I don’t think it should be blocking any of controller connections.

I think the usg has a feature called announce controller. Maybe this feature is misconfigured and so another controller is announced.

I’m a little bit confused, the device you reffer to is a usg bebind a usg? Or is the device a accesspoint that does not show up?

1 Like

@blex

The client has a comcast modem in bridge mode then there is a USG and 2 AC-LR access points. Nothing at the clients site shows up in our controller when it’s on that connection. I have brought the setup to other sites(my house,office, other client, even the public library) and it works without a problem. Plus I can see the connections in IFtop on the vm that hosts our controller when it is at the clients site.

I would then go to the conecoller an forget the device and try adopting it again.
And crate a ssh user on the usg and try a dns lookup. Maybe there is soemthing crude going on.

you found here something very strange. the only other idear I have for diagnostics is to put something in front of the usg (like a pfsense) and make a internet connection with the pfsense and see is this is working. (Laptop with one usb nic and a onboard nic - path trough to a vm running pfsense). If it is then working someting is odd with the usg. If you would like to drill down further you can put a pppoe server on the pfsense and try the pppoe dialin from the usg.

This could be a long shot; but, try sshing into the device and setting the inform that way. I have a few sites that i adopt at my office but when i install them onsite they forget the controller. If i set inform via ssh it fixes that issue for me. Hopefully will do the same for you; though i use an IP not a FQDN

Update: Still have the one site down. I put a cloud key in as a temp solution. We moved out of our office soon after the pandemic hit because my lease was up and I wanted to move spaces anyways so I moved almost everything to my basement but the severs. I worked out a deal to keep the servers there and running till I got another space or the landlord was able to rent it out. Last week I moved the servers to my house due to someone moving into space. About half of my sites have gone dark to the controller and half are ok. I tried forgetting and readopting but nothing. Same weirdness as that one work from the home setup we did.

My question now is what is the best next step to take?

  1. Nuke everything and rebuild(factory reset everything from my Pfsense box to the controller VM plus readopt every Unifi product).
  2. Switch to a cloud VM such as AWS or Azure and just readopt the sites that are dark.
  3. Put cloud keys in at each client site and not host a controller.
  4. Go to Hostify.
  5. Other ideas?

Ideally, I would like to host the one controller for all of my sites inhouse but this is starting to interfere with my business and I just need to move past this now.

Did you ssh into an access point and set the inform link? It sounds to me like most of the ap out in the field May have just become lost due to the controller move.

On how to move forward I would look to a server in a data center for sure as it sounds like you may have another move in your future. All the ways you sites above I believe are valid. It is about what is the best for you. If you have a fairly large equipment install hostify may be the most bang for your buck.

I can’t imagine the unifi controller eats up much bandwidth or even computer/ram. I would consider moving it to a cloud provider.
Are you 100% sure you’re seeing traffic on your controller via proper ports coming from the remote site? If moving the equipment in question resolves your issue, then that tells you it isn’t a configuration issue with the unifi controller / APs. This narrows it down to the USG at the remote site or their ISP. I would verify your controller is actually getting traffic from said site over the ports it needs. Either from a packet capture on your firewall or somehow from the controller itself.

@mikensan
It’s is defintly getting traffic from the APs in the wild. Running IFtop on the VM that hosts my controller I can see the connections comming in. They just don’t seem to register on the controller for some sites. The sites have a variaty of firewalls from Netgate to Cisco with diffrent ISPs. I’m having trouble finding a common factor. The only thing I did think of but I’m not really sure how to test is what if it’s an ISP issue for whatever route I’m on. My old office was pretty close and I could definity see them sharing a first or 2nd hop where some sites wouldn’t share it. I think tomorrow to test I’m going to setup a new controller at a family members house a couple towns away and see if that solve it.

@Thedannymullen
But if they were lost on the more trying to forget and readopt show work but it’s not. Right now I have to take them to a site that is still up to readopt. I can’t do it on a site that has gone dark.

This is so weird. Question once you take it to another site and re-adopt can it
Then be forgotten and readopted?
Asking this question because if the answer was yes then for some
Reason the old setup had an issue and once in the new config it works. Like the ap is doing something behind the scenes we are unaware of.

I know I had an issue readopting on my local network when I changed the ip of controller. One ap appeared to carry over the other two did not. After everything moved over no issues forgetting and readopting. Probably some broadcast packet that was scene by one ap and not the other two. Maybe behind the scenes even though you use domain the dns cache would not update, or somehow unifi used the ip even though you gave it domain.

@Thedannymullen

Sorry it took so long to respond had to get some other things done.

So I can take one from a dark site readopt it at a working site it will work. Bring it to a dark site it stops working but if I bring it back to the working site it works. It’s something to do with these sites but I it doesn’t seem to be with any of the equipment we manage.

@BoatYardJunkie this is such a weird issue.

A couple questions.

  1. At the sites that don’t work do they have a static ip or dynamic?
  2. Is the isp in common on any non working ones and working ones?

If the ip is dynamic you could try to force getting a new ip by changing the MAC address of the main router. Once my isp blocked me from updating my website. Drove me nuts until I realized what happened. They blocked just one dynamic ip. I changed it and never had the problem again. I called them out on it they denied it, but I moved on as it never happened again.
Another time actually a bad modem caused and issue like this. I could not explain it but I could get to some sites and not others. I just swapped modems and everything worked. Over the night I did have a lightning storm I suspect that messed it, but still unexplained problem.