Massive Packet Loss at 1 Location - mDNS multicast storm - 3cx, yealink, and unifi switches

SpreenTech · September 25, 2024, 3:07pm

We have four locations. Only 1 location is experiencing heavy packet loss (10-15%) with particular phone VoIP deployments that have been in place for years.

We have narrowed the phones down on 1 switch, we have vlan’ed them and segmented them off in small groups, we have tried many different switches, even different brands. We have shut down the entire buildings electrical, and just fired up the switch to eliminate any interference.

We have replaced phones, we have replaced switches, we have reorganized the network… nothing fixed it. When we take the phone and shut off the native VLAN; the problem stops. But once we turn that on; its as-if the phone is listening to the native vlan and getting bogged down by something; even in small groups.

So then we have narrowed this down to a multicast storm. Particularly coming from the windows machines in the building. Not sure why Windows 11 would be so mDNS intensive, but just in some small tests we noticed gigabytes of data flowing out of client side across the network.

mDNS Traffic Spikes :More recently, I’ve noticed that mDNS traffic is going wild on my network, and I’m trying to understand why it’s happening and whether it’s contributing to the packet loss issues. I’ve read that mDNS is useful for local name resolution, but in my case, it seems to be generating excessive multicast traffic.

Why would these computers be calling out so much?

To reduce it; we started by eliminating ipv6 across those machines… that did not help; it just used ipv4, but it helped solve the problem because I could easily block on the switch itself using ACL commands to 224.0.0.251 port 5353. I have the commands if people want them; for Unifi USW 48 port poe switches… the documentation is not great for unifi.

But on another wing of the network; another building connected to fiber; we havent removed ipv6; but we see mdns coming over thru that; and I’ve attempted to block ipv6; but cant get to that granularly enough to be specific; instead I was blocking all UDP traffic on ipv6; and I just read thats not recommended.

Any thoughts on whats going on here? Why these windows machines are contributing to so much chatter and disruptive network packets that it causes phones to BOG down to the point they are dropping packets and sluggish.

I think its pretty insane we have to block mDNS on the switch level or go to each pc; its easier to do with ipv4 than ipv6; to accomplish this we have to switch off ipv6 and just block ipv4 224.0.0.1 traffic; or shutdown the services on 100-200 machines 1 by 1.

LTS_Tom · September 26, 2024, 10:45am

Odd issue, not one I have encountered. Have you tried having all the Windows systems off for a bit and seeing if the problem still occurs?

SpreenTech · September 26, 2024, 7:23pm

We have tried many troubleshooting ideas. We’ve replaced every piece of hardware and reconfigured every possible setting.

We even thought maybe we have some electrical interference so we shut down everything in the server room after hours; and just powered up 1 48 port switch with a test PC; connected with POE phones powered up remotely, still saw the issue on that 1 switch with some devices. We then swapped out that switch couple times even with other brands, nothing changed. We then started shutting down some of the overhead lighting to remove that from equation; one thing we didn’t do is shutdown the PC’s around the building. So that test was failure…

Then we had the staff shutdown ALL pc’s at closing time until everything was off. Problem was gone 100%, that’s how we reduced it down to those machines. Nothing crazy is on the machines, windows 11 fresh install, dropbox, windows 11 optimizer (see below settings)

NiNite Install…
Chrome, Firefox, VLC, All Runtimes, Foxit Reader, LibreOffice, Filezilla, Notepad++, Putty, 7-Zip, Dropbox.

We went to each PC on one wing of the business, and we disabled ipv6, then all the traffic was on ipv4… we could easily block it with ACL’s with custom cli commands on unifi switches (really not much documentation available)

Then for the other building, we thought we would try to stop the root of the problem from the PC, and shut off mDNS so we then we went to this site… mDNS - The informal informer | f20
We used this guide to disable mDNS on all of the PCs

My point is this is pretty insane we had to go to all this trouble to handle this by going to each client PC. I’d like to find and isolate exactly why the PCs are calling out so much on these mDNS, its creating massive amount of traffic.

Another thing we noticed at this site; that we didnt see at other sites is the wireshark traffic and how packets would fragment… This problematic site has USW 48 poe pro switches, and the other sites have the gen2 US 48 POE switches (but we’ve tried both in isolation and cant say its the switch)

Anyways those are my thoughts, any ideas or thoughts of your own, much appreciated!

SpreenTech · September 26, 2024, 7:23pm

Here is our Windows 11 optimizer settings…

SpreenTech · September 26, 2024, 7:24pm

SpreenTech · September 26, 2024, 7:28pm

Also here is a article\podcast about a guy that experienced similar situation with a large Wifi Deployment; with mDNS, LLMNR, etc.

SpreenTech · September 27, 2024, 3:11am

Here is a dump from one of the PCs that was a top offender, and here is what we saw it calling out to. How this magnifies to massive traffic on the network is beyond me…

What makes it go away is… on each switch…

configure
access-list 100 deny udp any 224.0.0.251 0.0.0.0 eq 5353
access-list 100 permit ip any any
interface 0/1-0/48
ip access-group 100 in
exit
exit
write memory

But this only gets rid of IPv4 noise, not IPv6; if you turn off IPv6 everywhere then you got it wrapped up.