Page 1 of 3
WS-12-250-AC - Locking up?
Posted: Thu Jul 30, 2015 3:30 pm
by sbyrd
My switch has locked up twice now and I have to reboot it remotely (using web power switch) to get it back online. When it comes up it has constant RX oversize errors. However I have tried upping the MTU on the swtich to 1522, 1580 with no affect.
My mikrotik router has an L2 MTU of 1580 and MTU of 1500. The switch is connected to the Mikrotik using a Mikrotik RJ-45 SFP module. Could the SFP be bad? It has been working great for months.
I am not sure if it is definitely locking up as the Digital loggers power switch watchdog has rebooted the switch before I can get out on site to see if i can access the switch locally with my laptop. However the Mikrotik router showed the port as flapping. The switch and router are connected by a 1m premade ethernet cable and reside in the same outdoor box.
Could it be heat related? The switch is reading 54C,69C,69C. Fan speed of 9800RPM
Ask and I will provide more details as I am not sure what else is relevant. The switch powers 9 APs and services over 300 customers.
Re: WS-12-250-AC - Locking up?
Posted: Thu Jul 30, 2015 4:37 pm
by sirhc
Could it be heat related? The switch is reading 54C,69C,69C. Fan speed of 9800RPM
No that temperature is fine. On that model a board temp of 75C+ and CPU/PHY of over 100C is OK
When it comes up it has constant RX oversize errors.
Well it does not make this error up so obviously it is receiving packets larger than the MTU is set for. Not that I think that can lock it up I would increase the MTU until this stops then work backwards and see what could be sending this MTU size. Could this error be related to a BAD SFP module...eh, anything is possible I guess.
I have
NEVER seen our switches lock up except when we were testing loops on purpose.
We all know that there is a "possible" issue with Loop Protection. It does not effect everyone but some people it does. Eric is busy trying to get the DC switch firmware done for Monday so I dare not bother him. But what could be happening is the Loop Protection is disabling the port for 90 seconds?
Why not try the following:1) Disable Loop Protection on the Device/COnfiguration Tab
2) Increase your MTU to 9600 and see what happens to the RX Oversize errors
3) Last ditch effort replace the SFP module.
usRe: WS-12-250-AC - Locking up?t
Posted: Thu Jul 30, 2015 9:15 pm
by sbyrd
Well it happened again just now. I had not gotten a chance to upgrade to 1.24 or turn off loop protection.
Since it is late I just remote rebooted the switch again.
After the reboot I turned off loop protection. I was getting more constant RX oversize errors. It was up to 15000 in under a 2 minutes. I upped the MTU on the Trunk port to 1602 and they stopped. I then backed down to 1528 MTU and still not RX oversize. I have put it back to 1602 and will see if it goes down again if I get the RX oversize on 1602.
If it goes down again when I am in the office tomorrow and closer to the tower I am going first reseat the SFP and if that works I will move the trunk port from the Copper SFP to Eth Port 1 on the switch without rebooting to see if it is an SFP issue or switch issue.
I will also upgrade to 1.24 first thing in the morning when I am closer should anything major happen during the upgrade.
So that is 3 times today that this happened and has never happened prior on this switch. I has also never happened on my WS-8-250 AC, but if it is an SFP issue that would make sense as the 8 has no SFP ports.
Re: WS-12-250-AC - Locking up?
Posted: Thu Jul 30, 2015 9:36 pm
by sbyrd
This is what the log on the mikrotik router at the tower showed during the 'outage'. The Netonix on other ether3_trunk (ignore ether4 as it is another issue not related to Netonix).
- Log.PNG (27.42 KiB) Viewed 7619 times
Here are the Interface stats. Normally there are no errors
- Interface.PNG (8.32 KiB) Viewed 7619 times
Re: WS-12-250-AC - Locking up?
Posted: Thu Jul 30, 2015 9:45 pm
by sbyrd
I will also mention for those not versed in Mikrotik that the L2MTU listed for a routerboard interface "indicates the maximum size of the frame without MAC header that can be sent by this interface". So as I understand it the L2MTU of my tower router is 1580 so the maximum frame it can send is 1580 so earlier in the day when I set the MTU to 1580 on the Netonix it should have stopped receiving the RX oversize errors. At this point I am leaning heavily towards an SFP issue as it should not be possible for the router to send anything larger that 1580 ever.
The MTU setting on Mikrotik controls IP/Layer 3 MTU or "Specifies how big IP packets router is allowed to send out the particular interface". That is set to 1500 on my router.
Re: WS-12-250-AC - Locking up?
Posted: Thu Jul 30, 2015 10:24 pm
by sirhc
After talking to you on the phone and thinking about it this I think the SFP module is BAD and you lose connection with the switch since that is how you talk to the switch.
Replace the SFP module and see if the problem stops.
When you reboot the switch it re initializes the SFP module.
Re: WS-12-250-AC - Locking up?
Posted: Thu Jul 30, 2015 10:55 pm
by sbyrd
sirhc wrote:After talking to you on the phone and thinking about it this I think the SFP module is BAD and you lose connection with the switch since that is how you talk to the switch.
Replace the SFP module and see if the problem stops.
When you reboot the switch it re initializes the SFP module.
So your telling me my $22 Mikrotik copper SFP module can go bad? I paid top dollar for it and expect it to last forever!
Anyway I definitely am leaning that way too, as the alternative is much harder to diagnose or fix. I will replace the SFP in the morning and see how it goes. This however would be the first SFP I have had go bad. I have used dozens of the Mikrotik copper SFPs with no issue, but since they are so cheap it should be expected to have a bad one now and again.
I do hope it is the SFP and not the issue you had with one of your SFPs during storms (power blip).
Re: WS-12-250-AC - Locking up?
Posted: Sat Aug 01, 2015 4:31 pm
by sbyrd
Knock on wood. It has been over 24 hours without another connection issue. All I have changed so far is the MTU on the uplink port is 1602 and I turned off Loop Protection as I do not need it. Still on 1.22
Now this is not conclusive as the previous config worked with no issue for a couple of months.
Should it remain stable or go offline again I will update this thread.
Re: WS-12-250-AC - Locking up?
Posted: Sun Aug 02, 2015 12:00 pm
by sbyrd
Just happened again this time no amount of remote rebooting fixed it. Ethernet port on router showed constant up/down.
Since I was over an hour away I tried some things on the Mt router. The thing that brought the link back up was turning flow control on router from auto to either on or off. I left it as On.
Maybe an Mt/netonix issue with auto flow control?
Hope next time if it happens I will be closer so I can try a new sfp or moving it to an eth port on the switch.
Re: WS-12-250-AC - Locking up?
Posted: Sun Aug 02, 2015 12:15 pm
by sbyrd
Well I spoke too soon just went down again. I got one of my installers to get out there and moved the link from the sfp to eth port 1 on the switch. It's back up and I will see if it remains up.
Should that fix it I must assume a bad copper sfp.