I meant it is hard for MikroTik to debug and fix the problem if they can’t easily reproduce the problem.
If you’re a programmer, you know that to fix a problem, you have to know what is causing it. Then, once you know, it can be a simple fix, or if the original algorithm assumptions were incorrect, it may be a major overhaul.
And often problems are hard to reproduce when debugging, because the code generated when compiling with debug support is different, (and usually with compiler optimizations turned off so that code flow is more likely to mirror the source code). So things like buffer overflows can have different outcomes, race conditions may be less likely, etc.
Ubiquiti evidently had some problems in the EdqeSwitch line too, but it seems they finally got it mitigated (not sure if it ever got fixed, it involved a watch dog and reset I think. In other words, I am not sure if they just treated the symptoms (reset when problem detected), or the root cause of the problem, which is the only thing I consider to be a real fix. I think they started to create crash dump files when the problem was detected, and possibly even some telemetry data on crashes (which upset some customers).
But I don’t own any EdgeSwitches, and the only MikroTik switch I own is the CSS106-5G-1S (RB260GS).
I don’t think there is anything you can do to debug the problem, but if you have info related to being able to increase the chance of it happening, you should open a ticket with MikroTik and provide them the info, because the fix requires being able to find the problem. I don’t know if MikroTik even has the option to create crash dump files on the CSS326; I think it is pretty resource limited if it is like the CSS106-5G-1S (which is a switch chip with an low performance microcontroller to init and poll the chip for counters). I don’t think it has much ram or flash.