Problem with some device randomly rebooting

DOWNLOAD THE LATEST FIRMWARE HERE
kevinpez
Member
 
Posts: 4
Joined: Mon Feb 01, 2016 1:35 pm
Has thanked: 0 time
Been thanked: 0 time

Problem with some device randomly rebooting

Thu Sep 22, 2016 11:24 am

We have updated our switch software to the most current release candidate and this problem still seem to be occurring, were a set of the devices on the switch appears to reboot at random. This only seems to be happening on one of the power boards, and port 13-15 don't seem to be affected, the next time I'm at this tower I'll move some devices to the other side just to see if that helps, but this is got me scratching my head. It seems to be occurring two or three times a day. Please let me know what other information you might need to help troubleshoot


Jan 1 00:00:09 netonix: 1.4.5rc2 on WS-24-400A
Jan 1 00:00:15 system: Setting MAC address from flash configuration: EC:13:B3:01:5B:D0
Jan 1 00:00:18 admin: adding lan (eth0) to firewall zone lan
Dec 31 17:00:32 admin: removing lan (eth0) from firewall zone lan
Dec 31 17:00:35 admin: adding lan (eth0) to firewall zone lan
Dec 31 17:00:47 admin: adding lan (eth0) to firewall zone lan
Dec 31 17:00:48 system: starting ntpclient
Dec 31 17:00:49 main: packet_rx_filter_change NOT IMPLEMENTED
Dec 31 17:00:51 Port: link state changed to 'up' (100M-F) on port 21
Dec 31 17:00:53 dropbear[1356]: Running in background
Dec 31 17:00:59 system: starting ntpclient
Dec 31 17:01:00 switch[1430]: Detected warm boot
Dec 31 17:01:02 Port: link state changed to 'down' on port 2
Dec 31 17:01:02 Port: link state changed to 'down' on port 3
Dec 31 17:01:02 Port: link state changed to 'down' on port 4
Dec 31 17:01:03 Port: link state changed to 'down' on port 5
Dec 31 17:01:03 Port: link state changed to 'down' on port 6
Dec 31 17:01:03 Port: link state changed to 'down' on port 8
Dec 31 17:01:03 Port: link state changed to 'down' on port 9
Dec 31 17:01:03 Port: link state changed to 'down' on port 10
Dec 31 17:01:03 Port: link state changed to 'down' on port 11
Dec 31 17:01:03 Port: link state changed to 'down' on port 12
Dec 31 17:01:03 Port: link state changed to 'down' on port 7
Dec 31 17:01:05 Port: link state changed to 'up' (10M-F) on port 2
Dec 31 17:01:05 Port: link state changed to 'up' (10M-F) on port 3
Dec 31 17:01:05 Port: link state changed to 'up' (10M-F) on port 12
Dec 31 17:01:05 Port: link state changed to 'up' (10M-F) on port 4
Dec 31 17:01:05 Port: link state changed to 'up' (10M-F) on port 6
Dec 31 17:01:05 Port: link state changed to 'up' (10M-F) on port 10
Sep 21 11:16:06 Port: link state changed to 'up' (100M-F) on port 7
Sep 21 11:16:06 Port: link state changed to 'up' (100M-F) on port 5
Sep 21 11:16:06 Port: link state changed to 'up' (100M-F) on port 11
Sep 21 11:16:06 Port: link state changed to 'up' (100M-F) on port 9
Sep 21 11:16:06 Port: link state changed to 'up' (100M-F) on port 8
Sep 21 11:16:23 Port: link state changed to 'down' on port 7
Sep 21 11:16:24 Port: link state changed to 'down' on port 11
Sep 21 11:16:24 Port: link state changed to 'down' on port 5
Sep 21 11:16:24 Port: link state changed to 'down' on port 9
Sep 21 11:16:24 Port: link state changed to 'down' on port 8
Sep 21 11:16:24 Port: link state changed to 'down' on port 4
Sep 21 11:16:25 Port: link state changed to 'down' on port 2
Sep 21 11:16:25 Port: link state changed to 'down' on port 10
Sep 21 11:16:25 Port: link state changed to 'down' on port 12
Sep 21 11:16:25 Port: link state changed to 'down' on port 6
Sep 21 11:16:25 Port: link state changed to 'down' on port 3
Sep 21 11:16:25 Port: link state changed to 'up' (100M-F) on port 7
Sep 21 11:16:25 Port: link state changed to 'up' (100M-F) on port 11
Sep 21 11:16:25 Port: link state changed to 'up' (100M-F) on port 5
Sep 21 11:16:25 Port: link state changed to 'up' (100M-F) on port 9
Sep 21 11:16:25 Port: link state changed to 'up' (100M-F) on port 8
Sep 21 11:16:26 Port: link state changed to 'up' (100M-F) on port 4
Sep 21 11:16:26 switch[1486]: unexpected link change on port 4 (Methodist#0East) from 10M-F100M-F to 100M-F
Sep 21 11:16:26 Port: link state changed to 'up' (100M-F) on port 10
Sep 21 11:16:26 Port: link state changed to 'up' (100M-F) on port 2
Sep 21 11:16:26 Port: link state changed to 'up' (100M-F) on port 6
Sep 21 11:16:26 Port: link state changed to 'up' (100M-F) on port 12
Sep 21 11:16:26 Port: link state changed to 'up' (100M-F) on port 3
Sep 21 11:16:27 switch[1488]: unexpected link change on port 10 (Capt Zipline PTP) from F to 100M-F
Sep 21 11:16:27 switch[1490]: unexpected link change on port 6 (Methodist#1East) from F to 100M-F
Sep 21 11:16:27 switch[1492]: unexpected link change on port 12 (MethodistB4) from F to 100M-F
Sep 21 11:16:27 switch[1494]: unexpected link change on port 3 (BearValley) from F to 100M-F
Sep 21 21:56:29 Port: link state changed to 'down' on port 3
Sep 21 21:56:29 Port: link state changed to 'down' on port 4
Sep 21 21:56:29 Port: link state changed to 'down' on port 5
Sep 21 21:56:29 Port: link state changed to 'down' on port 6
Sep 21 21:56:29 Port: link state changed to 'down' on port 7
Sep 21 21:56:29 Port: link state changed to 'down' on port 10
Sep 21 21:56:29 Port: link state changed to 'down' on port 11
Sep 21 21:56:29 Port: link state changed to 'down' on port 12
Sep 21 21:56:29 Port: link state changed to 'down' on port 2
Sep 21 21:56:31 Port: link state changed to 'up' (10M-F) on port 3
Sep 21 21:56:31 Port: link state changed to 'up' (10M-F) on port 4
Sep 21 21:56:32 Port: link state changed to 'up' (10M-F) on port 10
Sep 21 21:56:32 Port: link state changed to 'up' (10M-F) on port 12
Sep 21 21:56:32 Port: link state changed to 'up' (10M-F) on port 2
Sep 21 21:56:32 Port: link state changed to 'up' (10M-F) on port 6
Sep 21 21:56:32 switch[2663]: unexpected link change on port 3 (BearValley) from 100M-F to 10M-F
Sep 21 21:56:32 switch[2665]: unexpected link change on port 10 (Capt Zipline PTP) from 100M-F to 10M-F
Sep 21 21:56:32 switch[2667]: unexpected link change on port 12 (MethodistB4) from 100M-F to 10M-F
Sep 21 21:56:33 switch[2669]: unexpected link change on port 2 (PonchaPass) from 100M-10M-F10M-F to 10M-F
Sep 21 21:56:33 switch[2671]: unexpected link change on port 6 (Methodist#1East) from 100M-F to 10M-F
Sep 21 21:56:38 Port: link state changed to 'up' (100M-F) on port 7
Sep 21 21:56:38 Port: link state changed to 'up' (100M-F) on port 11
Sep 21 21:56:38 Port: link state changed to 'up' (100M-F) on port 5
Sep 21 21:56:55 Port: link state changed to 'down' on port 11
Sep 21 21:56:55 Port: link state changed to 'down' on port 7
Sep 21 21:56:55 Port: link state changed to 'down' on port 5
Sep 21 21:56:56 Port: link state changed to 'down' on port 4
Sep 21 21:56:56 Port: link state changed to 'down' on port 2
Sep 21 21:56:56 Port: link state changed to 'down' on port 10
Sep 21 21:56:56 Port: link state changed to 'down' on port 12
Sep 21 21:56:56 Port: link state changed to 'down' on port 6
Sep 21 21:56:56 Port: link state changed to 'down' on port 3
Sep 21 21:56:57 Port: link state changed to 'up' (100M-F) on port 7
Sep 21 21:56:57 Port: link state changed to 'up' (100M-F) on port 11
Sep 21 21:56:57 Port: link state changed to 'up' (100M-F) on port 5
Sep 21 21:56:57 Port: link state changed to 'up' (100M-F) on port 4
Sep 21 21:56:58 Port: link state changed to 'up' (100M-F) on port 2
Sep 21 21:56:58 Port: link state changed to 'up' (100M-F) on port 10
Sep 21 21:56:58 Port: link state changed to 'up' (100M-F) on port 6
Sep 21 21:56:58 Port: link state changed to 'up' (100M-F) on port 12
Sep 21 21:56:58 Port: link state changed to 'up' (100M-F) on port 3


device-status.PNG

status-1.PNG

status-2.PNG

status-3.PNG

kevinpez
Member
 
Posts: 4
Joined: Mon Feb 01, 2016 1:35 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 11:34 am

So just a little more info is all the devices are Ubiquity Rocket M5 access points, and we are connected to a TripLight 2400Watt UPS, the device is grounded to the tower blocks with solid 12g wire.

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 11:49 am

kevinpez wrote:the device is grounded to the tower blocks with solid 12g wire.

I know you say the UPS ground lug is connected to the tower ground but is the AC service ground rods connected to the tower ground rods with a heavy #2 wire?

I do not see any damaged current sensors (thanks for thinking ahead and showing all ports on the Status Tab).

So I do not "think" this switch is damaged.

So there are (2) 24V power supplies on the WS-24, one for ports 1-12 and one for ports 13-24. Each are capable of a constant 6A (144 watts) constant and 10A peak (240 watts).

The total watts being used on ports 1-12 is less than 60 watts constant so your demand is far below the capacity of the 24V power supply taking care of ports 1-12.

I do see the radios rebooting in the log which is odd.

So I have some questions:
1) As asked above are the tower ground rods bonded to the electrical service ground rods with a heavy #2 wire?

2) How long has this switch been in service?

3) Did this behavior just recently start or has it been going on for awhile?

4) Are there any watch dog rules defined under the Tools/Watchdog Tab?

5) Can you please verify if the UBNT Rockets have a ping watchdog enabled.

6) And you do log into the Rockets and verified that indeed they are rebooting by looking at their up time?

7) The switch is not rebooting at all?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

kevinpez
Member
 
Posts: 4
Joined: Mon Feb 01, 2016 1:35 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 1:20 pm

1) The UPS is grounded with a #4 solid wire, but I'm pretty sure that is sufficient, not sure a #2 would even fit in the lug. The AC must be grounded to the earth, this is a large tower site with many other providers on it. We are on our own circuit breakers, and share the service but I will verify next time I'm on site. Let's assume it all good.

2) The switch was just put in service.

3) This problem started occurring right away.

4) There are no watchdog rules set in the switch.

watchdog.PNG


5) There is no watchdog rule set in the AP's.

ubnt-watchdog.PNG


6) The Rockets are indeed rebooting.

poncha-pass.PNG


7) I only rebooted the switch due to the firmware upgrade; the switch seems to stay stable during the whole incident.


I'm wondering if one devices wiring could cause another device to short out is there some kind of isolation between the ports? I'd be willing to believe that one device has bad wire ends or something but not all of them.

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 1:53 pm

kevinpez wrote:1) The UPS is grounded with a #4 solid wire, but I'm pretty sure that is sufficient, not sure a #2 would even fit in the lug. The AC must be grounded to the earth, this is a large tower site with many other providers on it. We are on our own circuit breakers, and share the service but I will verify next time I'm on site. Let's assume it all good.

I was not saying a #2 ground wire to the UPS I was asking is the Electrical service ground rods are bonded to the tower ground rods with #2 wire or heavier.

If not then your Ethernet cables become this bond and ground potential current starts flowing across your Ethernet cables.

Here are some good posts on grounding to better explain this concept and potenital problem:
viewtopic.php?f=30&t=1816
viewtopic.php?f=30&t=188
viewtopic.php?f=30&t=1429
viewtopic.php?f=17&t=1786&start=30#p13447
https://community.ubnt.com/t5/airFiber/ ... rue#M31070

THESE LINKS ABOVE ARE REALLY GOOD READS

If your ground rods are not bonded (Electric Service Ground Rods connected to Tower Ground Rods) then you will soon start frying things in this order:
1) Current Sensors in the switch
2) Either the Ethernet port in the switch or the radios or both.

Now could this cause radios to reboot, well if it is enough ground current yes but more likely it will just start frying stuff.

kevinpez wrote:2) The switch was just put in service.
3) This problem started occurring right away.
4) There are no watchdog rules set in the switch.
5) There is no watchdog rule set in the AP's.
6) The Rockets are indeed rebooting.
7) I only rebooted the switch due to the firmware upgrade; the switch seems to stay stable during the whole incident.

2 through 7) OK

kevinpez wrote:I'm wondering if one devices wiring could cause another device to short out is there some kind of isolation between the ports? I'd be willing to believe that one device has bad wire ends or something but not all of them.

If a wire is shorting "momentarily" and I mean just for a second it might cause a dip in the 24V power supply....."MAYBE".

You could test this theory by moving some to the higher ports as they are on a differnt 24 power supply. Then if ports 1-12 reboot you know it is one of them and if it is ports 13-24 you know it is one of them.

New questions:
1) Do you have Ethernet Surge suppressor on each line, maybe they are causing the issue (I hate Ethernet Surge Suppressors)?
You could try removing them and see if they exist.

2) Are you running ESD shielded wire and ends?

3) Can you run a cable diagnostics on each port driving a Rocket and report back if you see anything other than:

Pair 1: OK
Pair 2: OK
Pair 3: Short <= This is normal for UBNT airMAX 10/100 devices
Pair 4: Short <= This is normal for UBNT airMAX 10/100 devices

Also report back if the cable pairs are not all even on a port as if the cable is good they should be the same length or very close.

Now if you have Ethernet Surge Suppressors on the line the results vary from manufacturer to manufacturer.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

kevinpez
Member
 
Posts: 4
Joined: Mon Feb 01, 2016 1:35 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 2:26 pm

There are at least two cabling problems on the site; I'm sending out our guys tomorrow to run through each line and see if the problems can be resolved. I'll let you know once we are sure each cable is good. We don't have any ethernet surge arrestors in line. Are there any you would recommend, do you feel that is necessary?

Yes the wire is shielded grounded here a link to the datasheet
http://www.primuscable.com/Shared/image ... -416BK.pdf

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 2:54 pm

OK your using shielded cable with ESD drain wire but are you using ESD crimp ends that connect to the ESD drain wire like these:
https://www.netonix.com/accessories/net-rj45-100.html

Do not care what brand you use, UBNT, RF Armor, Netonix or another brand that connects the ESD shield to the Drain wire.

Please ask your guys when on site to insure Tower ground rods are bonded to Electrical Service Ground Rods via a heavy cable like #2
PLEASE LET ME KNOW ON THIS ISSUE - IMPORTANT

Also dedicated ground runs to each radio and extra large service loops on Ethernet cable are HIGHLY suggested.

Please take a few moments and read those Grounding Posts above, I spent a lot of time on them for you guys (WISPs). I am a WISP as well and learning "proper" grounding was one of my greatest achievements over the past 20 years of being a WISP.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

horse1bun
Member
 
Posts: 5
Joined: Mon Sep 19, 2016 10:13 pm
Has thanked: 0 time
Been thanked: 3 times

Re: Problem with some device randomly rebooting

Thu Sep 22, 2016 7:12 pm

Hi guys, long time lurker, first time poster.

It seems the ground potential and proper grounding and problems derived from them, is a common theme with WISP towers?

I just wanted to say, sirhc, these articles are amazing. I have always felt I was lacking in complete understanding proper grounding methods. It's dense reading for me, but google searching has helped clarify things I don't understand; these are well written articles. Thank you so much for sharing your wisdom. Netonix is a great product and I look forward to working with them more across our network.

I am following this thread and eager to see what the solution is!

Return to Hardware and software issues

Who is online

Users browsing this forum: No registered users and 83 guests