We’ve been facing a very strange DHCP issue across a medium-sized UniFi deployment (10+ APs and about 20 switches) where Wi-Fi clients intermittently fail to obtain DHCP leases. Initially, only wireless devices were affected, but over time, the issue spread to wired clients on the default LAN as well. Devices would often fall back to a (169.x.x.x) address. I did contact Unifi support, but they haven’t had much to say so far. They asked for a support file which was failing to download in our self-hosted controller instance, so I decided to upgrade to the new UniFi OS server. Within 10 - 15 minutes of the upgrade, everything magically started to get IPs, and the wifi experience indicator on the dashboard went to 100% for DHCP (it had been as low as 49%). However, today the issues have started to appear on some wifi devices again.
Some other notes:
-
We are using pfSense as our gateway, I tried disabling the DHCP server on the LAN network and setup and temporary Windows Server 2025 DHCP server but with the same issues. I then tried a complete fresh install of OPNSense and again with a different broadband connection with much the same result.
-
I tried packet capturing on a laptop and a load of DHCP requests would go out with no response from the gateway. But the moment I unplugged the switch from the rest of the network (but with the OPNSense gateway plugged in) the laptop would instantly get an IP.
-
I also tried unplugging each device on our 48 port Pro POE core switch to see if I can physically isolate the issue to particular device or area of the network. It made no difference.
-
We tried removing all non-unifi switches in case this was and STP type issue, but also no difference.
-
There has been no issues with wired devices getting IPs in other VLANS. Unifi support asked us setup a test SSID on each VLAN and the devices that got an IP would get an IP on each VLAN without and issue, but a wifi device that wasn’t getting and IP on the LAN would also not get anything on the other VLANS.
-
None of unifi devices themselves struggle to get an IP. Occasionally, they will get a 192.168.1.20 address which is outside any of our DHCP ranges, but they would always manage to find and IP address from the LAN DHCP in a few minutes.
We tried lots of other silly things, but nothing worked, and the most significant impact was when me moved to UniFi OS which makes me convinced that this is a unifi issue. It started about a month ago, but has got really bad in the past few weeks and has been highly disruptive. Restarting the controller also made a difference a couple of times, but not consistently.
I have posted about this on the unifi forums, and I have seen two other posts there with people facing similar issues. Has anyone else experienced this recently?
Wi-Fi clients not getting DHCP address | Ubiquiti Community
Do you have the proper stp priorities set on your switches? Or do you have the proper trunk ports with the proper VLAN’s assigned?
https://help.ui.com/hc/en-us/articles/24292724428311-Understand-and-Mitigate-Network-Loops-STP
Yes, and usually the controller would say if there is an STP loop or the topology would go all funny. It is very similar to STP issue, but with an STP issues, even static IPs start to have problem if I recall correctly. The core switch is set to 0 and everything below that supports STP is set to the next number up. We have to remove a number of non unifi switches that didn’t support STP, but the basic Unifi switches have always worked even if they don’t have STP. The network has got bigger overtime, but we have no issue like this for the 6 years we have been using pfSense + Unifi.
I also did try physically unplugging each are of the network from the core switch just in case, but it made no difference.
Most things are trunk ports at the moment with only a few VLANs locked down like for CCTV.
The UniFi devices going back to 192.168.1.20 is those devices not getting a DHCP address and falling back to default.
A packet capture at both the gateway and AP uplink will show whether the DHCP OFFERs are leaving the gateway but never reaching the client which is my guess as to what is happening (but I am not sure why) I would also check the logs from the DHCP server.
Does the pfsense happen to be virtualized?
I’m too familiar with what DHCP packets on a healthy, network, but there is nothing at all for DHCP when I do a capture with no filters on the LAN network at the pfsense end even for devices that successfully connect. At the device end, there are lots of BOOTP requests going out and no responses.
However, I think that there is a pfsense issue as the capture window shows no lines and doesn’t limit the capture to a 1000 lines. Yet the capture file is full of packets when I download it. It was working fine a few days ago before updating to 2.8.1. Since updating, lots of things have broken in pfsense including the acme cert manager, the package manager and now the packet capture. But I don’t think this has anything to do with the DHCP issue (the issues predated the update, and I tried other DHCP servers. The DHCP logs also fill up pretty quickly with ISC DHCP, but I did also try Kea instead just in case that was the issue. But there are no logs to be seen on Kea.
I will try to spin up Windows DHCP server again and do a packet capture there. pfsense is installed on a trusty old DELL R200 and had never caused us issues for the past 10 + years!
Are you saying you tried to plug directly into you router
Plugging a device directly in the (pfsense) router work fine. When I tried OPNsense, I disconnected the pfsense server physically from the Core Pro Switch and plugged the OPNSense box into the Office Switch. The issues where just the same even with OPNSense doing DHCP. The moment I unplugged the office switch from the wall (the uplink port to the core switch), the laptop I was testing with instantly got an IP. I then tried to physically unplug everything one by one from the Core Pro Switch but nothing made any difference. I’ve attached the topology to explain the layout.
Where is your dhcp located in this network map?
Have you tried just having your DHCP connected to your main pro switch, disconnect all the switches. Plug in an endpoint verify it is working. If so, then adding each switch one at a time and verify dhcp operational. I assume your dhcp scope isn’t filled (all in use) up too?
From your description its not the DHCP server that is the issue, but a connection back to the dhcp from.end points.
Also, is there a reason foe all these cascading of switches? These should be going back to main aggregation link. Why do you have a aggre2switxh with nothing connected?
UniFi OS Server does appear to have resolved this now. After I upgrade to it from the regular network application, everything DHCP magically got fixed within 15mins of the upgrade. But over the next few days, there were a couple of minor issues on mostly if not exclusively Wi-Fi devices over the next few days. But I then discovered (after a duplicate IP notification) that another Debian VM was still suffering from the previous symptoms under the old network controller and was showing tons of IP addresses in XOA. A quick reboot of the VM and the DHCP indicator in the Unifi dashboard was back up to 100% within a few hours. And today both DHCP and DNS are at 100%. They were at 100% when I first upgraded but went as low as 80% under the OS Server. They were prevously as low as 49% and never much more than 70%.
I don’t know what went wrong, but I am 100% convinced that it was the Network application. It coincided with the upgrade to 9.4.17 if I recall correctly but got much worse soon after upgrading to 9.4.19. I have another site which uses the Unifi Dream Machine Pro and it had no issues at all. The new UniFI OS Server is great, but it still doesn’t work in the iOS app for some reason - fine on Android. It’s a bit unnerving when DHCP fails you like this but at the network level, particularly because I relied on a lot of reservation in pfSense.
@randybell In answer to your question the DHCP Server is located on pfSense which is connected to a trunk port on the main pro core switch. But I also tried Windows DHCP server and OPNSense. DHCP. There are two ranges on the LAN and on is at 22% utilisation and the other is at 25. 155 total in the first range and 253 in the second.
The cascade of switches is due to a less than ideal physical site layout, but this has never caused issues before other than easily resolvable STP issues with third party unmanaged switches. The aggregation switch is temporarily being used a storage interconnect to connect a Synology and a truenas server to two XCPNG hosts. Not ideal, but there’s not yet enough ports on the servers for a direct link. The uplink on the aggregation is purely for management, and the storage network is on its own VLAN with no DHCP at all.