Switch(s) stops forwading traffic
Posted: Sat Nov 17, 2018 7:00 am
Hi,
We have a very odd issue that has reared its head only in the last few weeks but oddly its happening at multiple venues but only one with a single switch at each.
The issue is the switch simply stops allowing and traffic to pass through it, we are unable to ping it and looking at the ARP table on the router the MAC of the switch and any device other than the initial connection to the switch no longer has an IP address associated to the MAC.
We are a creature of habit so please allow me to set the scene as all our venues are basically the exact same setup other than size of network.
All equipment is a Mikrotik router, UBNT Nano's ( old sites being M5's newer sites being AC ), UBNT outdoor AP's and Netonix WS-6-Mini switches. We manage around 500 Nano's over 27 sites using around 250 switches. Each nano and switch uses a common build and firmware versions range from 1.48 to 1.5.
The setup is pretty simple as we provide public WiFi services in open public places. Simply put the router connects to a Nano, nano then connects to street furniture with a couple more nanos connected to a switch and a AP and on to the next street light and so on.
So, hopefully that gives an idea of our setup. As i mentioned above, in the last 2 weeks we have had 4 switches all stop working at 4 different venues, one of which went live last week where the switch simply stops passing any traffic. As far as we can tell the switch is working fine the nano connected to the switch in question that also connects to the previous street light is accessible, provides a link, we can log into that nano it but nothing else and all equipment following this link is shown as offline in either aircontrol or Unifi manager. We are unable to log into the switch, unable to ping it or anything else connected to that switch.
The four sites have an age range of between 1.5 years or a 1 week so all switches are different hardware batches, the old sites are running 1.47 and the latest 1.5 firmware, the only common thing is they ALL run the same build. However so do all the other switches that are working fine.
If the switch is rebooted everything comes back on fine for around 24 hours and then stops working again. To resolve things we have replaced the switch with the exact same build and config and the issue goes away. So now i have three switches that i'm unsure of if they have a hardware issue or could i uses them elsewhere due to a config problem.
Even with everything we are doing / have done I am convinced this isn't a hardware or firmware issue, i am leaning toward config but whats confusing me is we use the same config everywhere so why is it only happening to a single switch at different venues.
I have an issue on one switch ( FW 1.4.9 ) at the moment, it went off on Thursday, rebooted yesterday and stopped working again early this morning. I'm unable to get to it until Monday now so this maybe an opportunity to work out what the issue is.
Having googled the symptoms there are a couple of old threads on the forum where people have had a similar issue and "Flow Control" is pretty much mentioned as being a potential issue on all.
I have checked our flow control settings on other switches and they are all set to "both", i have also checked a couple of the Mikrotik routers and Flow control is switched off. I know this will be the same everywhere based on using a std build.
My question is two part really, do you think flow control could be the issue and if so what settings should I be using. If not... any idea's ??
If configs will help i can jump onto a working switch and paste screenshots, based on past experiences even if i get to physically connect to a switch with issues i cant get on it anyway :-(
Thanks in advance for any help / pointers.
We have a very odd issue that has reared its head only in the last few weeks but oddly its happening at multiple venues but only one with a single switch at each.
The issue is the switch simply stops allowing and traffic to pass through it, we are unable to ping it and looking at the ARP table on the router the MAC of the switch and any device other than the initial connection to the switch no longer has an IP address associated to the MAC.
We are a creature of habit so please allow me to set the scene as all our venues are basically the exact same setup other than size of network.
All equipment is a Mikrotik router, UBNT Nano's ( old sites being M5's newer sites being AC ), UBNT outdoor AP's and Netonix WS-6-Mini switches. We manage around 500 Nano's over 27 sites using around 250 switches. Each nano and switch uses a common build and firmware versions range from 1.48 to 1.5.
The setup is pretty simple as we provide public WiFi services in open public places. Simply put the router connects to a Nano, nano then connects to street furniture with a couple more nanos connected to a switch and a AP and on to the next street light and so on.
So, hopefully that gives an idea of our setup. As i mentioned above, in the last 2 weeks we have had 4 switches all stop working at 4 different venues, one of which went live last week where the switch simply stops passing any traffic. As far as we can tell the switch is working fine the nano connected to the switch in question that also connects to the previous street light is accessible, provides a link, we can log into that nano it but nothing else and all equipment following this link is shown as offline in either aircontrol or Unifi manager. We are unable to log into the switch, unable to ping it or anything else connected to that switch.
The four sites have an age range of between 1.5 years or a 1 week so all switches are different hardware batches, the old sites are running 1.47 and the latest 1.5 firmware, the only common thing is they ALL run the same build. However so do all the other switches that are working fine.
If the switch is rebooted everything comes back on fine for around 24 hours and then stops working again. To resolve things we have replaced the switch with the exact same build and config and the issue goes away. So now i have three switches that i'm unsure of if they have a hardware issue or could i uses them elsewhere due to a config problem.
Even with everything we are doing / have done I am convinced this isn't a hardware or firmware issue, i am leaning toward config but whats confusing me is we use the same config everywhere so why is it only happening to a single switch at different venues.
I have an issue on one switch ( FW 1.4.9 ) at the moment, it went off on Thursday, rebooted yesterday and stopped working again early this morning. I'm unable to get to it until Monday now so this maybe an opportunity to work out what the issue is.
Having googled the symptoms there are a couple of old threads on the forum where people have had a similar issue and "Flow Control" is pretty much mentioned as being a potential issue on all.
I have checked our flow control settings on other switches and they are all set to "both", i have also checked a couple of the Mikrotik routers and Flow control is switched off. I know this will be the same everywhere based on using a std build.
My question is two part really, do you think flow control could be the issue and if so what settings should I be using. If not... any idea's ??
If configs will help i can jump onto a working switch and paste screenshots, based on past experiences even if i get to physically connect to a switch with issues i cant get on it anyway :-(
Thanks in advance for any help / pointers.