Another mystery freak out and reboot

DOWNLOAD THE LATEST FIRMWARE HERE
User avatar
sporkman
Member
 
Posts: 86
Joined: Mon Jul 27, 2015 7:03 pm
Location: New York, NY
Has thanked: 8 times
Been thanked: 11 times

Another mystery freak out and reboot

Fri Jul 08, 2016 5:21 pm

I chimed in on the 1.4.0 thread here with my WS-6-Mini:

viewtopic.php?f=17&t=1722&p=13096#p13096

I was hoping that this was a firmware issue and that's where that thread went (high cpu utilization, STP/discovery problems).

Yesterday the switch did the same thing - I get a bunch of alerts of the switch and things beyond it not being reachable, the uplink for this switch shows the port flapping up and down, and then at some point it seems to clear up and the only evidence I've got after that is that the switch rebooted itself (which seems to then behave normally for a time). There was no intervention to bring the switch back.

I have a syslog server and all my stuff logs there, but the only peep I have out of this switch is the boot sequence once it's done rebooting.


Am I looking at some kind of hardware failure? I have a few more of these switches without this problem, also with similar gear plugged into them (UBNT ac radios - Rockets and PowerBeams). They did have the same high CPU usage that has since been resolved, but no lockups/reboots like this. The only unique thing about this switch compared to the others is that the uplink goes to a Cisco 3550.

Some logs below, just to show the timeline.

The upstream switch showing the port going up/down:

Code: Select all
Jul  7 10:45:41.863 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to down
Jul  7 10:45:41.863 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan46, changed state to down
Jul  7 10:45:41.863 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan101, changed state to down
Jul  7 10:45:42.871 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul  7 10:50:25.495 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul  7 10:50:27.775 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul  7 10:50:52.856 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul  7 10:50:55.180 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul  7 11:00:12.821 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul  7 11:00:15.301 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul  7 11:03:30.985 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul  7 11:03:33.185 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul  7 11:09:26.513 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul  7 11:09:28.633 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul  7 11:12:58.622 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul  7 11:13:00.626 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to up
Jul  7 11:13:04.394 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to down
Jul  7 11:13:08.998 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to up
Jul  7 11:13:38.999 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan46, changed state to up
Jul  7 11:13:38.999 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan101, changed state to up


Switch log from syslog:

Code: Select all
Dec 31 19:00:52 192.168.3.48 Port: link state changed to 'up' (1G) on port 3
Dec 31 19:00:52 192.168.3.48 STP: set port 3 to discarding
Dec 31 19:00:55 192.168.3.48 STP: set port 3 to learning
Dec 31 19:00:55 192.168.3.48 STP: set port 3 to forwarding
Jul  7 11:13:47 192.168.3.48 Port: link state changed to 'down' on port 4
Jul  7 11:13:47 192.168.3.48 STP: set port 4 to discarding
Jul  7 11:13:49 192.168.3.48 Port: link state changed to 'up' (100M-F) on port 4
Jul  7 11:13:49 192.168.3.48 STP: set port 4 to discarding
Jul  7 11:13:49 192.168.3.48 switch[902]: !unexpected link change on port 4 10M-F
Jul  7 11:13:52 192.168.3.48 STP: set port 4 to learning
Jul  7 11:13:52 192.168.3.48 STP: set port 4 to forwarding
Jul  7 11:15:41 192.168.3.48 UI: Configuration backup by bwayadmin (xxxxx)

 


Switch is still running 1.4.2rc6 (seems my release notification PMs are going to spam these days - it will get an upgrade tonight). Unit pulls about 18W. Port 1 is a trunk port going to a cisco 3550 (VLAN 1 and 101), port 3 is also a trunk port going to a rocket ac, port 4 is an access port (VLAN 1) going to a RocketM. Outside temperature yesterday was around 91F. Yesterday, board temp was showing around 58C, CPU temp about 83C. A nearby unit was reading about 5C higher overall with no issues.

Where to start with debugging this? Or is it wiser to just throw another unit up there?

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Another mystery freak out and reboot

Fri Jul 08, 2016 6:00 pm

Well yes the best thing to do here is upgrade to v1.4.2 FINAL and see if it happens again.
The board temperatures are OK so no worries there.

So did you check if the radios on port 3 and 4 rebooted (look at uptime of radios) as your log indicates they may have rebooted?

What is powering this MINI?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sporkman
Member
 
Posts: 86
Joined: Mon Jul 27, 2015 7:03 pm
Location: New York, NY
Has thanked: 8 times
Been thanked: 11 times

Re: Another mystery freak out and reboot

Fri Jul 08, 2016 6:10 pm

sirhc wrote:Well yes the best thing to do here is upgrade to v1.4.2 FINAL and see if it happens again.
The board temperatures are OK so no worries there.


Cool. (No pun intended) This week has been pretty much as hot as it will get this summer.

Upgrade happens tonight...

sirhc wrote:So did you check if the radios on port 3 and 4 rebooted (look at uptime of radios) as your log indicates they may have rebooted?


Just looked, uptime matches the switch, so we had a reboot there.

sirhc wrote:What is powering this MINI?


UBNT AF power supply/injector.

Also, prior to this uptime was about 30 days.

User avatar
sporkman
Member
 
Posts: 86
Joined: Mon Jul 27, 2015 7:03 pm
Location: New York, NY
Has thanked: 8 times
Been thanked: 11 times

Re: Another mystery freak out and reboot

Thu Jul 14, 2016 3:43 pm

Happening again. Device is on 1.4.2. I'm going to have to throw another piece of hardware up there. We don't have a ton of people on this equipment, but they're all higher bandwidth business customers.

edit: came back after bouncing the port on the switch it was connected to, but that looks like a possible coincidence - looking at it, the uptime is only 3 minutes.

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Another mystery freak out and reboot

Thu Jul 14, 2016 4:00 pm

Not sure what I can suggest other than swap the unit out and see if the problem goes away.


Possible problems:
Bad switch
Bad POE injector
Something totally differnt than what we think it is.

You could also have a bad radio that is causing the switch to reboot as it draws too much power. Another user reported this the other day that a NanoBeam or PowerBeam was BAD and after replacing it the problem went away.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sporkman
Member
 
Posts: 86
Joined: Mon Jul 27, 2015 7:03 pm
Location: New York, NY
Has thanked: 8 times
Been thanked: 11 times

Re: Another mystery freak out and reboot

Thu Jul 14, 2016 4:23 pm

Someone's going to pop a new one in today if the weather cooperates. We're also swapping the PoE brick while we're at it. Not sure what to do to test the old one further, don't want to throw it on a rooftop unless I know it's not going to keep doing this.

Checked the syslog server as well, nothing from this unit prior to the port going down or the reboot. If the thing is panicing, is a corefile stored anywhere after boot? Is there anything of interest stored across reboots?

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Another mystery freak out and reboot

Thu Jul 14, 2016 4:43 pm

sporkman wrote:Someone's going to pop a new one in today if the weather cooperates. We're also swapping the PoE brick while we're at it. Not sure what to do to test the old one further, don't want to throw it on a rooftop unless I know it's not going to keep doing this.

Checked the syslog server as well, nothing from this unit prior to the port going down or the reboot. If the thing is panicing, is a corefile stored anywhere after boot? Is there anything of interest stored across reboots?


Well if the replacement does the same thing start swapping radios one at a time.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sporkman
Member
 
Posts: 86
Joined: Mon Jul 27, 2015 7:03 pm
Location: New York, NY
Has thanked: 8 times
Been thanked: 11 times

Re: Another mystery freak out and reboot

Sat Jul 16, 2016 3:48 pm

Yeah, I'm not going as far as replacing two new radios. If the switch were just rebooting or something without all these other symptoms prior to the reboot I'd be more inclined to point the finger at the two UBNT radios.

The switch is freaking out, so I'm either going to find out why (it must be logging something somewhere while it's flapping), assume it's dying hardware, or just reevaluate what we put on the roofs out here.

There's a new mini waiting to be installed, should be in place on Monday.

In the meantime, same problem today. Port to the downstream switch goes down, flaps for quite a long time, then the switch reboots.

Code: Select all
 
Jul 16 14:47:39.844 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to down
Jul 16 14:47:39.844 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan46, changed state to down
Jul 16 14:47:39.844 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan101, changed state to down
Jul 16 14:47:39.844 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan102, changed state to down
Jul 16 14:47:40.853 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 14:52:01.792 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 14:52:04.028 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:00:12.017 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:00:14.201 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:12:09.190 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:12:11.986 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:16:50.863 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:16:54.828 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:20:30.513 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:20:34.389 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:21:54.381 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:21:56.541 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:23:32.305 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:23:35.829 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 16 15:23:39.049 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 16 15:23:42.689 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to up
Jul 16 15:24:12.691 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan46, changed state to up
Jul 16 15:24:12.691 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan101, changed state to up
Jul 16 15:24:12.691 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan102, changed state to up


My non-tech colleague is 100% sure it's the heat, just because of the timing. I'm a bit less sure since all the units are within a few blocks of each other in the same (recommended) outdoor case. The others have also recorded higher temperatures.

Screen Shot 2016-07-16 at 3.42.11 PM.png
misbehaving switch


Screen Shot 2016-07-16 at 3.43.22 PM.png
good switch


Right now, it is sweltering, a personal weather station within a few blocks is reading 97F.

Screen Shot 2016-07-16 at 3.39.38 PM.png
weather

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Another mystery freak out and reboot

Sat Jul 16, 2016 4:06 pm

Those temps your showing are fine for the WS-6

But what I said was swap the switch first and if the new switch is still doing this then it has to be something else like a radio or grounding or power or something.

Just because the radios are NEW does not mean there is not a bad one that is using too much power which is what another person found out.

Wish I could find his post but he swapped out a NEW PowerBeam I think and the problem went away.

But I agree swap the switch first to eliminate the switch then go from there.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sporkman
Member
 
Posts: 86
Joined: Mon Jul 27, 2015 7:03 pm
Location: New York, NY
Has thanked: 8 times
Been thanked: 11 times

Re: Another mystery freak out and reboot

Fri Aug 12, 2016 5:20 pm

OK, I've had the new switch in for 24 days now, and 24 days of uptime.

Is this an RMA? I'm a bit leery of repurposing the old switch.

Do we want to go two months trouble-free to prove it out?

Next
Return to Hardware and software issues

Who is online

Users browsing this forum: Google [Bot] and 84 guests