Another mystery freak out and reboot
Posted: Fri Jul 08, 2016 5:21 pm
I chimed in on the 1.4.0 thread here with my WS-6-Mini:
viewtopic.php?f=17&t=1722&p=13096#p13096
I was hoping that this was a firmware issue and that's where that thread went (high cpu utilization, STP/discovery problems).
Yesterday the switch did the same thing - I get a bunch of alerts of the switch and things beyond it not being reachable, the uplink for this switch shows the port flapping up and down, and then at some point it seems to clear up and the only evidence I've got after that is that the switch rebooted itself (which seems to then behave normally for a time). There was no intervention to bring the switch back.
I have a syslog server and all my stuff logs there, but the only peep I have out of this switch is the boot sequence once it's done rebooting.
Am I looking at some kind of hardware failure? I have a few more of these switches without this problem, also with similar gear plugged into them (UBNT ac radios - Rockets and PowerBeams). They did have the same high CPU usage that has since been resolved, but no lockups/reboots like this. The only unique thing about this switch compared to the others is that the uplink goes to a Cisco 3550.
Some logs below, just to show the timeline.
The upstream switch showing the port going up/down:
Switch log from syslog:
Switch is still running 1.4.2rc6 (seems my release notification PMs are going to spam these days - it will get an upgrade tonight). Unit pulls about 18W. Port 1 is a trunk port going to a cisco 3550 (VLAN 1 and 101), port 3 is also a trunk port going to a rocket ac, port 4 is an access port (VLAN 1) going to a RocketM. Outside temperature yesterday was around 91F. Yesterday, board temp was showing around 58C, CPU temp about 83C. A nearby unit was reading about 5C higher overall with no issues.
Where to start with debugging this? Or is it wiser to just throw another unit up there?
viewtopic.php?f=17&t=1722&p=13096#p13096
I was hoping that this was a firmware issue and that's where that thread went (high cpu utilization, STP/discovery problems).
Yesterday the switch did the same thing - I get a bunch of alerts of the switch and things beyond it not being reachable, the uplink for this switch shows the port flapping up and down, and then at some point it seems to clear up and the only evidence I've got after that is that the switch rebooted itself (which seems to then behave normally for a time). There was no intervention to bring the switch back.
I have a syslog server and all my stuff logs there, but the only peep I have out of this switch is the boot sequence once it's done rebooting.
Am I looking at some kind of hardware failure? I have a few more of these switches without this problem, also with similar gear plugged into them (UBNT ac radios - Rockets and PowerBeams). They did have the same high CPU usage that has since been resolved, but no lockups/reboots like this. The only unique thing about this switch compared to the others is that the uplink goes to a Cisco 3550.
Some logs below, just to show the timeline.
The upstream switch showing the port going up/down:
- Code: Select all
Jul 7 10:45:41.863 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to down
Jul 7 10:45:41.863 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan46, changed state to down
Jul 7 10:45:41.863 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan101, changed state to down
Jul 7 10:45:42.871 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 7 10:50:25.495 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 7 10:50:27.775 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 7 10:50:52.856 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 7 10:50:55.180 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 7 11:00:12.821 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 7 11:00:15.301 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 7 11:03:30.985 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 7 11:03:33.185 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 7 11:09:26.513 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 7 11:09:28.633 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to down
Jul 7 11:12:58.622 EDT: %LINK-3-UPDOWN: Interface FastEthernet0/46, changed state to up
Jul 7 11:13:00.626 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to up
Jul 7 11:13:04.394 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to down
Jul 7 11:13:08.998 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/46, changed state to up
Jul 7 11:13:38.999 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan46, changed state to up
Jul 7 11:13:38.999 EDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan101, changed state to up
Switch log from syslog:
- Code: Select all
Dec 31 19:00:52 192.168.3.48 Port: link state changed to 'up' (1G) on port 3
Dec 31 19:00:52 192.168.3.48 STP: set port 3 to discarding
Dec 31 19:00:55 192.168.3.48 STP: set port 3 to learning
Dec 31 19:00:55 192.168.3.48 STP: set port 3 to forwarding
Jul 7 11:13:47 192.168.3.48 Port: link state changed to 'down' on port 4
Jul 7 11:13:47 192.168.3.48 STP: set port 4 to discarding
Jul 7 11:13:49 192.168.3.48 Port: link state changed to 'up' (100M-F) on port 4
Jul 7 11:13:49 192.168.3.48 STP: set port 4 to discarding
Jul 7 11:13:49 192.168.3.48 switch[902]: !unexpected link change on port 4 10M-F
Jul 7 11:13:52 192.168.3.48 STP: set port 4 to learning
Jul 7 11:13:52 192.168.3.48 STP: set port 4 to forwarding
Jul 7 11:15:41 192.168.3.48 UI: Configuration backup by bwayadmin (xxxxx)
Switch is still running 1.4.2rc6 (seems my release notification PMs are going to spam these days - it will get an upgrade tonight). Unit pulls about 18W. Port 1 is a trunk port going to a cisco 3550 (VLAN 1 and 101), port 3 is also a trunk port going to a rocket ac, port 4 is an access port (VLAN 1) going to a RocketM. Outside temperature yesterday was around 91F. Yesterday, board temp was showing around 58C, CPU temp about 83C. A nearby unit was reading about 5C higher overall with no issues.
Where to start with debugging this? Or is it wiser to just throw another unit up there?