IPSEC suddenly stops working

I need a little help or advice if possible. I currently have 4 sites that were all running 2.4.5p1 pfSense with IPSEC connecting all 4 together without any major issues.

Internal IPs in /24s using 172.16.0.x, 172.16.1.x, 172.16.2.x and 172.16.3.x.

With the release of 2.5.0 I ran the upgrade on 172.16.0.x (which is ideally a test-lab location) which kinda screwed up (I know, should have clean installed…) The environment was using a Lanner box running an older Atom processor which is pretty-much end-of-life, so have some Watchguard Firebox XTM 5’s with C2D processors, 4Gb RAM - which was my short-term upgrade path for greater use of IDS as the Atom ran too high on utilization when doing a lot…

Built the XTM5, restored a configuration and after a lot of tweaking got it running with all packages and IPSEC tunnels. No biggie, just took longer and a little more complex than I had hoped.

Herein lies the issue… After running for a while, the IPSEC on that location just appears to stop, VPN offline, clicking connect from there or from one of the other sites doesn’t resolve anything. Clicking stop on the GUI doesn’t stop, restart also seems to do nothing. Am unable to run ‘swanctl --list-conns’ or ‘swanctl --load-all --file /var/etc/ipsec/swanctl.conf --debug 1’ as it doesn’t respond with anything

If I reboot, all is good for a while until the same happens again.

Believing the issue is with 2.5.0, I just rebuilt that system to 2.4.5p1, restored some config to keep my IPSEC tunnels, interfaces etc, NAT, Firewall rules and so on an so forth. System was up and running from midnight.

Just realized a short while ago that the tunnel is now not responding again. Internet is not dropping as I have remote access to computers at that location. Logged into firewall and checked Status, IPSEC which says the usual collecting information, nothing. Ran shell, cannot issue swanctl commands just like before. Checking the IPSEC log from the shell shows corruption occurring @ 10:52 -
Apr 15 10:52:04 FCU-Group-FW charon: 11[IKE] <con1000|5> activatCLOG^A^@^@^@\xc2\xf2^A^@\xec\xcd^G^@^@^@^@^@

Can this ACTUALLY be hardware related to the XTM5 or am I missing something absolutely obvious??? I mean, I put it back to 2.4.5p1 so same version as the others etc…

Obviously I can’t change the others to 2.5.0 or 2.5.1 until I know for sure what is the root cause and ensure stability…

Any help would be greatly appreciated…!

Don’t have much knowledge of IPsec but as a temporary fix, if you add the service watchdog package you ought to be able to add your VPN to it, when it stops it should restart it, though you probably have to do a bit more digging to get to the root of the problem.

We have not had any stability issues with IPSEC but you may want to post over in their forums as well https://forum.netgate.com/

Yep, I had already tried that but it looks like the service doesn’t ‘stop’ (service is still running), just like it stops responding or working. Logging seems to just suddenly stop.

Thanks, just a thought, I wasn’t sure if perhaps that particular Firebox which appears fine had an issue. I have 3 of them out of the 4 I was planning to replace, may just move to another and retest. I may also just build a 2.5.1 without any restore and manually add things back in, pain in the ass to redo all NAT and specific rules, but best way I guess to test properly.