Banana Jack wrote:Thanks sirhc - at the moment I'm suspecting that a network loop and/or bad plug/cable/radio is to blame... Today I've been enabling loop protection on selected switches and it seems to be pointing me toward a certain part of the network based on the ports on which loops are being detected. So maybe I'm making some progress.
Just to update you sirhc and anyone else who's interested for the record: this problem turned out to be an apparently faulty Mikrotik S-85DLC05D SFP module which was in our main WS-26-400-AC switch. Wireshark showed a broadcast spike (2000-3000 PPS) every minute and sometimes with additional smaller spikes in between. Swapping out the SFP module immediately cured it - screenshot attached.
The Netonix 'Loop Protection' feature was useful in showing up the problem and mitigating it in some parts of the network (to the detriment of the downstream parts), although sometimes the port that it reported as having the problem led me up some blind alleys. It was only when I tried setting Broadcast Storm Control to 2K on our main switch that the switch freaked out, crashed (with syslog kernel err FAIL: RX Alloc 1866 bytes) and when it came back up, the SFP remained down. I had to plug/unplug the SFP several times to get the link up, so then I became suspicious and swapped it out, which then cured the broadcast spikes according to Wireshark's extremely useful Statistics > I/O Graph feature. Hopefully our switches will now stop watchdog-rebooting.
Thanks
Glenn