Troubleshooting intermittent client disconnections

Have a doozy…
I have a client.
They have a very basic setup. 10 users, a server, a few printers all connected to an unmanaged switch. The switch is then connected to a dsl carrier modem/router (I know…don’t get me started)
About 2 months ago, they started to experience DSL problems. It had been rock solid for years. I got the DSL provider to check the line. They did (as far as the front door) and all checked out. I then bridged the router into a modem and put a EdgeX router behind it with a basic setup. Again, problems still occured but not as frequently.
I noticed that the LEDs on the modem were very faint so I swapped it out with a similar modem. Problem appear to stabilise for a bit.
I then put in a pfsense box about a week ago with a 4g (LTE) failover modem connected to one of the ports. That appeared to work for a couple of days and then during the week both Internet connections dropped. I ended up going on site, pulling the pfsense box and putting the 4G modem back into router mode and ran the network off that. It was slow but it did provide internet access.
Over the weekend, I went back and double checked everything. I had brought home the pfsense to check the config, just to make sure I hadn’t setup anything wrong. I hadn’t but I did a complete reinstall. While onsite,I noticed that the WiFi was down. I checked the access point (which is relatively) new and and wasn’t able to access it by way of the web configuration page. I tried a hard reset but no luck. The access point appears to be borked.
This morning, I log into the network remotely, I see the WAN is down but the failover 4G modem (back in bridge mode) is up.
But here’s the really interesting thing. Anyone who uses screenconnect for remote access knows that it connects back to a connectwise instance. All the machines have screenconnect on them. I was able check the connection ‘timeline’ of each of the screenconnect clients. This would show if there was an break in the connection between the client and the connectwise server.
During last night.
4 of the client computers were relatively stable. The connection to the screenconnect server had a few small disconnects, we’re talking maybe 30 secs. This proves there was an internet connection.
5 of the others experienced drops between 7 and 40 minutes. They are all in different parts of the building

So the internet is a problem for sure but there’s more going on here.
I’m thinking it might be power surges. Based on the 2 other devices (modem and wireless access point) proving troublesome and needing to be replaced.

One other point that I haven’t mentioned but might be pertinent is that there was renovation work being carried out in an adjoining unit during this period. They have stopped about 2 weeks ago but I’m just wondering if there’s anything they could have done to somehow alter the power.

I hope my explanation is somewhat coherent. It would be great to hear any ideas or theories. This has worn me down.

Thanks.
Niall

ps. does anyone know or have experience of using power monitoring devices?

I would put a good UPS in place to power all the internet equipment and see what happens. Might be too late depending on the nature of the power issues and everything is already fried. We’ve been having an increase in power related issues at work, and the equipment that isn’t protected with a UPS is starting to show issues (again).

1 Like

Thanks @Greg_E I’m going to be ordering one tomorrow. Just need to get it signed off.
It doesn’t explain the PCs dropping their connections intermittently but it’s a good suggestion to at least protect that equipment…even if it might be too late.
I’ve been told by another carrier that they have strong 5G coverage in that location so I’m going to be ordering my first 5G modem tomorrow…they are pricey!
This will of course be plugged into a surge protected socket :wink:

If you are in the USA, and T-Mobile has a good 5g connection, get them to sign up for a B2B plan with a bring your own device. You can also maybe get a static external IP with the B2B plans.

Then look at the Chester Tech. Cheetah, you can get it for other carriers, but mine is for T-Mobile home internet and works a lot better than the supplied modem (phone). Ease of connecting exterior antennae might be worth the price.

Have you done any testing which completely bypasses the unmanaged switch? I have seen those go bad. If you can, plug something directly into the modem or router. See if the issue recoccurs for that device. Alternatively, put a new or different known good switch in there. Worth a shot. Good luck. I know networking issues can be frustrating.

1 Like

Thanks @Greg_E . I’m in Ireland. No chance of getting a static IP. CGNAT implemented by all the carriers but I don’t really need a fixed IP at the moment. I can get through using screenconnect if they fall back onto the current 4G (LTE) modem.

@SKTC_Sean I was thinking about the switch. There is a spare on site but of course some genius plugged it in so it was open to the same power fluctuations (if that’s what is causing this problem). I’m going over this evening with the 5G modem and at the same time, I’m going to swap the switches and put in a surge protection power strip.
I’ll come back with an update in the next couple of days.

1 Like

@NiallCon - at least you have a few more ideas to consider. Looking forward to hearing the results!

I second the idea of swapping out that switch and also UPS with Auto-Voltage regulation for all the network equipment. You might be experiencing intermittent brown-outs from other equipment in the building or a shoddy power grid situation.

Anything that has a switching power supply doesn’t need the best sine wave output from a UPS, but that doesn’t stop me from buying the higher end stuff that is usually medical certified. Pure sine wave used to be very important when I had more analog video and audio equipment plugged in, but I still consider it important for the computers. I’ve stuck with APC for a long time and replaced a lot of batteries (every 3 years if you can get the budget). I’ve also bought cheaper third party batteries that last better than what APC uses.

I had a kind-of similar problem but the problem wasn’t restricted to IT equipment - lot’s of random faults across everything that required electricity. In that case we ended up approaching the the DNO and they installed a gadget onto the main incoming supply, basically a series of hall-effect current monitors that you attached to each supply line (note: 3-phase supply) and a data logger. After a week we extracted the data-logger files and could see that the supplies were all over the place … voltages from 150V up to 260v (well out of spec), line drops i.e. to 0V and material levels of current down both neutral and earth due to phase load disparities (brownouts) and fault conditions.

The farm was unusually (in the UK) fed by its own sub-transformer (big box on top of a pole, far more common in the US). They replaced that and all the problems went away.

There was talk about water ingress and material failure, but I never read the final report.

The tool seemed very basic, I’ll bet you could build one with a data-logging microcontroller (e.g. Adafruit’s ‘adalogger’) and a few off the shelf current sensors. Or ask your DNO, actually, probably best to ask your DNO!

Unfortunately, I have already realized that not every provider employee is aware when you contact them with a problem like yours. It reminds me of the motto “the light is green - we don’t see any problems”. :thinking: At the beginning of a provider change I also had the phenomenon that the line isconnected irregularly and reconnected again after a short time.
After the provider’s technical department analyzed the last 24 hours, it became clear that the “attenuation” of the line was incorrect and that the line had therefore disconnected. that’s one of my experiences, but i can’t guarantee that it will solve your problem.

Good luck
Andy

Not going to lie…had to consult my friend the G-man on what DNO was…lol
Very interesting though! I’m going to get on it in the morning. I’ve just come back from the site.
Swapped out the switch and put everything on surge protected socket strips. The existing ones had no surge protection. I wonder do they protect against undervoltage? I messaged the owner on the way home. He said that he is meeting with a new property management company during the week and they are looking into getting a dedicated supply for just his part of the building. I did notice while I was there a sign on the wall which was surrounded by LEDs. They were definitely pulsing intermittently. The main lights weren’t but I guess the LEDs might be more susceptible to smaller fluctuations.

Also installed a Teltonika 5G modem. Expensive but looks to be solid bit of kit (first one purchased). Was getting 180-200Mbps consistently with fairly low latency so hopefully that will do until the other issues are sorted. While I was there the main DSL was up but it didn’t take long before it went off again.

Anyway, we’ll see how it goes. Going to monitor the connection for the next 24 hours and see how it goes.

Thanks for the input!

Yes @Andy80, the girl I was dealing with was definitely one of those…lol. The problem is they are nothing but a reseller and have no access to 2nd/3rd line support. Definitely time to change but need to get everything else ruled out first.

Ahh, yes, apologies - I couldn’t actually remember what it stood for myself at the time of writing, and was too lazy to find out! (Distribution Network Operator). In the UK, the DNO is responsible for the the wires going into your premises from the National Grid.

The surge protectors do not protect against low voltage (or gradual high voltage depending on design), you’ll need a decent UPS for that job. In theory the surge protectors only protect against a spike, they all use an MOV, here’s a white paper for those that want to dig deeper:

https://www.mouser.com/pdfdocs/bourns-tips-on-selecting-the-right-mov-surge-suppressor-white-paper.pdf

Some of the surge protectors have a little more circuitry, but no much. All should have a fuse or circuit breaker.

Thanks @Greg_E
We’ve just had a storm over here so busy day. I think my brain might explode if I read that document.
What would be your UPS of choice. I’ve heard good things about Eaton. Probably only need a small one that would support small voltage devices. Firewall/Switch/Modem etc.
I presume it’s ok to plug a power strip into the UPS because some of the devices don’t have the standard power cable (we call them Kettle leads over here).

Eaton, Schneider, etc. big name manufacturers might be best to stay with. In the USA some of them come with hardware protection warranty. They will have the same voltage spike protection, and will protect against low voltage and brown outs.

I always buy the “smart” type and usually the “pure sine wave” style which is often sold as “medical” certified. I wouldn’t buy one less than about 1000-1500 VA. Something like this (UK shown)

You can choose other features like USB connection, network connection, etc., but the price goes up with each option. I also like to buy the style that allows adding extra battery packs, I run a second pack on most of mine in the server areas, hoping to get past about 2 hours so the outage might be restored (no generator for my stuff). I’d say 10-12 years on the device, change the battery every three years or four (not more than 4 years or you will have a mess!).

Even with a network connection, my campus IT department never seems to know when they have a battery go bad, so not sure how much info you can get from it.

Other people will probably suggest other things, some of them good and cheaper, some of them just cheap. When money is on the line, I’m sticking with APC as they have a long track record, and a long record with me, so far, so good.

You could probably go with a 600-750VA device to bring the cost down, but I’m not sure about battery replacement in the smaller units. It starts to get to be a larger percentage or the device, so just toss the entire device after a few years.

I also like the smart UPS with the LCD on the front, makes it easier to set battery date, and monitor load and incoming power. Worth the extra few dollars for me.

I’d probably talk to the local manufacturer’s office and see what they suggest, they might suggest a lower featured model than the Smart line of devices.

@NiallCon
i hope i’m not destroying your idea that the provider is directly better … i discussed / tested my issue with a reseller and was lucky that the technician recognized the reported problem and we checked the line attenuation. it must be said that i have a business modem in front of my firewall, which does not come from the reseller and therefore he could only see and evaluate the DSL port in the distributor, since my modem has no remote management / configuration activated.

Hope you get your problem soon solved

Regards
Andy

Thanks @Greg_E. Food for thought for sure.
I’ve always regarded APC a little on the pricey side. Yes, it’s taken me longer than it should to figure out that you more often that not, get what you pay for.
Great to get some positive (and experienced) feedback.
I will definitely take off of that onboard.
Much appreciated.