WS-26-500-DC intermittent power issues??
-
intellipop - Member
- Posts: 46
- Joined: Tue Nov 10, 2015 11:10 pm
- Location: Salt Lake City, UT
- Has thanked: 7 times
- Been thanked: 2 times
Re: WS-26-500-DC intermittent power issues??
Sorry Chris let me put this in more understandable terms. The switch rebooted and it's current uptime as of this moment (reference the time stamp in the post) is 8Hours and 50 Minutes.
-
intellipop - Member
- Posts: 46
- Joined: Tue Nov 10, 2015 11:10 pm
- Location: Salt Lake City, UT
- Has thanked: 7 times
- Been thanked: 2 times
Re: WS-26-500-DC intermittent power issues??
intellipop wrote:Here is the uptime rundown of our 26 port switches:
6 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
23 Minutes - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
8 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
2 Days - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
61 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
8 Hours- What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
Yes, the three in question have been loaded with 1.5.0rc1, still not improving the rebooting issue.
All of our 26 port switches with a rebooting issue are WS-26-500-DC models.
Mat
-
sirhc - Employee
- Posts: 7421
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1609 times
- Been thanked: 1326 times
Re: WS-26-500-DC intermittent power issues??
intellipop wrote:Sorry Chris let me put this in more understandable terms. The switch rebooted and it's current uptime as of this moment (reference the time stamp in the post) is 8 Hours and 50 Minutes.
OK so it rebooted 8+ hours ago and you just noticed it?
It had been up for 2+ days then rebooted?
Yes I would appreciate you put v1.5.0rc1 on all WS-26-500-DC and see if there is any improvements on any units.
Also look at the switch log, at the top of the log v1.5.0rc1 now indicates if the unit booted via the watchdog - this is important for us to know.
Post up the first 20 lines of the log please.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
intellipop - Member
- Posts: 46
- Joined: Tue Nov 10, 2015 11:10 pm
- Location: Salt Lake City, UT
- Has thanked: 7 times
- Been thanked: 2 times
Re: WS-26-500-DC intermittent power issues??
Here you go.
Dec 31 19:00:06 netonix: 1.5.0rc1-201803191145 on WS-26-500-DC
Dec 31 19:00:11 system: Setting MAC address from flash configuration: EC:13:B2:06:09:3E
Dec 31 19:00:14 admin: adding lan (eth0) to firewall zone lan
Dec 31 19:00:15 admin: Unable to query power supply
Dec 31 19:00:27 STP: MSTI0: New root on port 2, root path cost is 20000, root bridge id is 32768.64-D1-54-D5-11-AB
Dec 31 19:00:47 UI: i2c error setting 0x47 12 110
Dec 31 19:01:08 UI: i2c error setting 0x47 14 122
Dec 31 19:01:12 dropbear[931]: Running in background
Dec 31 19:01:16 switch[976]: Detected cold (watchdog) boot
I EDITED YOUR LOG ELIMINATING NOISE AND KEEPING LINES OF INTEREST - sirhc
Dec 31 19:00:06 netonix: 1.5.0rc1-201803191145 on WS-26-500-DC
Dec 31 19:00:11 system: Setting MAC address from flash configuration: EC:13:B2:06:09:3E
Dec 31 19:00:14 admin: adding lan (eth0) to firewall zone lan
Dec 31 19:00:15 admin: Unable to query power supply
Dec 31 19:00:27 STP: MSTI0: New root on port 2, root path cost is 20000, root bridge id is 32768.64-D1-54-D5-11-AB
Dec 31 19:00:47 UI: i2c error setting 0x47 12 110
Dec 31 19:01:08 UI: i2c error setting 0x47 14 122
Dec 31 19:01:12 dropbear[931]: Running in background
Dec 31 19:01:16 switch[976]: Detected cold (watchdog) boot
I EDITED YOUR LOG ELIMINATING NOISE AND KEEPING LINES OF INTEREST - sirhc
-
intellipop - Member
- Posts: 46
- Joined: Tue Nov 10, 2015 11:10 pm
- Location: Salt Lake City, UT
- Has thanked: 7 times
- Been thanked: 2 times
Re: WS-26-500-DC intermittent power issues??
sirhc wrote:intellipop wrote:Sorry Chris let me put this in more understandable terms. The switch rebooted and it's current uptime as of this moment (reference the time stamp in the post) is 8 Hours and 50 Minutes.
OK so it rebooted 8+ hours ago and you just noticed it?
It had been up for 2+ days then rebooted?
Yes I would appreciate you put v1.5.0rc1 on all WS-26-500-DC and see if there is any improvements on any units.
Also look at the switch log, at the top of the log v1.5.0rc1 now indicates if the unit booted via the watchdog - this is important for us to know.
Post up the first 20 lines of the log please.
Yes, it's a switch we have setup in our office just to run testing with you without impacting customers, so no it's not monitored in real time. I logged in and noticed it had again rebooted by noting the uptime of the device.
Yes, I believe the uptime prior to the reboot was around 2 days.
-
sirhc - Employee
- Posts: 7421
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1609 times
- Been thanked: 1326 times
Re: WS-26-500-DC intermittent power issues??
intellipop wrote:intellipop wrote:Here is the uptime rundown of our 26 port switches:
6 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
23 Minutes - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
8 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
2 Days - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
61 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
8 Hours- What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
Yes, the three in question have been loaded with 1.5.0rc1, still not improving the rebooting issue.
All of our 26 port switches with a rebooting issue are WS-26-500-DC models.
Mat
Are all the units above all WS-26-500-DC ?
If so 1 has been up for 61 days, another 8 days, another 6 days?
Obviously these units do not have v1.5.0rc1 on them.
If the other 3 have been loaded with v1.5.0rc1 the one that is up 23 minutes? Did it just reboot or did you just load v1.5.0rc1 23 minutes ago?
The one that is up 2 days obviously we can not say yet?
The one that is 8 hours you said is in office, how are you powering it?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
intellipop - Member
- Posts: 46
- Joined: Tue Nov 10, 2015 11:10 pm
- Location: Salt Lake City, UT
- Has thanked: 7 times
- Been thanked: 2 times
Re: WS-26-500-DC intermittent power issues??
sirhc wrote:intellipop wrote:intellipop wrote:Here is the uptime rundown of our 26 port switches:
6 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
23 Minutes - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
8 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
2 Days - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
61 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
8 Hours- What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
Yes, the three in question have been loaded with 1.5.0rc1, still not improving the rebooting issue.
All of our 26 port switches with a rebooting issue are WS-26-500-DC models.
Mat
Are all the units above all WS-26-500-DC ?
If so 1 has been up for 61 days, another 8 days, another 6 days?
Obviously these units do not have v1.5.0rc1 on them.
If the other 3 have been loaded with v1.5.0rc1 the one that is up 23 minutes? Did it just reboot or did you just load v1.5.0rc1 23 minutes ago?
The one that is up 2 days obviously we can not say yet?
The one that is 8 hours you said is in office, how are you powering it?
The unit that has the 23 minute uptime was updated after it crashed again, since it's a production site we don't normally have the ability to just down it anytime. But with the rash of outages at random times due to switch reboots we attempted the update after the monitoring reported it down.
I don't have any new data on the two day uptime switch, other then to say I'm not holding my breath as we already have seen crashes on 1.5x with the office test switch.
The office switch is a 30v supply, the remote switches very from solar to boat battery chargers.
Mat
-
sirhc - Employee
- Posts: 7421
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1609 times
- Been thanked: 1326 times
Re: WS-26-500-DC intermittent power issues??
Dec 31 19:01:16 switch[976]: Detected cold (watchdog) boot
So this line is what I am looking for.
To clarify you have (6) WS-26-500-DC units?
Here is the uptime rundown of our 26 port switches:
6 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
23 Minutes - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
8 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
2 Days - What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
61 Days - What mode WS-26-400-AC or WS-26-500-DC - Obviously you have not loaded v1.5.0rc1
8 Hours- What mode WS-26-400-AC or WS-26-500-DC - Did you load v1.5.0rc1?
Please load this v1.5.0rc1 on all WS-26-500-DC tonight.
PM me your cell number, time zone, and when tomorrow your available.
I can call you tomorrow and Teamview into your computer as I want to look at some things if that works for your schedule.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7421
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1609 times
- Been thanked: 1326 times
Re: WS-26-500-DC intermittent power issues??
Mat please stop saying crash, the unit is rebooting, there is a BIG difference.
I want to work with you but your not helping by continuing to call an event something it is not, I need to work in facts and details when trying to debug something like this especially from a far. The switch you RMA'd back to us ran perfectly for days in our office and we never were able to make it reboot. v1.5.0rc1 was a best guess as to the cause as we could not recreate it. There are HUNDREDS of these models out there with 3 or 4 people having issues. We were excited to get your unit back thinking we could recreate the issue but the unit worked flawlessly. The only way we could make it reboot was to create a network loop and flood the switch with 7+ Million pps.
I understand your frustration and trust me I am not having fun but obviously all our other models have been rock solid for you.
The "REBOOT" is a problem as it takes your node down for several minutes while it reboots and this is happening from what I can gather anywhere from once a day to once every several days or longer.
Your can always remove the WS-26-500-DC models from service until we resolve the issue.
I want to work with you but your not helping by continuing to call an event something it is not, I need to work in facts and details when trying to debug something like this especially from a far. The switch you RMA'd back to us ran perfectly for days in our office and we never were able to make it reboot. v1.5.0rc1 was a best guess as to the cause as we could not recreate it. There are HUNDREDS of these models out there with 3 or 4 people having issues. We were excited to get your unit back thinking we could recreate the issue but the unit worked flawlessly. The only way we could make it reboot was to create a network loop and flood the switch with 7+ Million pps.
I understand your frustration and trust me I am not having fun but obviously all our other models have been rock solid for you.
The "REBOOT" is a problem as it takes your node down for several minutes while it reboots and this is happening from what I can gather anywhere from once a day to once every several days or longer.
Your can always remove the WS-26-500-DC models from service until we resolve the issue.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7421
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1609 times
- Been thanked: 1326 times
Re: WS-26-500-DC intermittent power issues??
So please install v1.5.0rc1 on all WS-26-500-DC
PM me your cell and time to call after lunch EST, best is later in my afternoon if possible.
I want to Teamview into your machine, then login to the switches and check some things.
Your log indicates WATCHDOG is initiating the reboots which means it is not hardware but rather software. It "could" be a lose I2C cable which is one thing I want to look at tomorrow, I can tell from looking at the unit UI.
Another possible cause is a broadcast storm which I said will trigger the WATCHDOG to reboot the switch.
I need you to install v1.5.0rc1 on all units and see if it "improves" the situation because there is no difference between a WS-12-250-DC and a WS-26-500-DC other than more ports and it has (2) 250W power supplies working in tandem and this is not an issue with the WS-12-250-DC or any other model.
The WS-26-500-DC runs a higher CPU utilization than a WS-12-250-DC because more ports and more telemetry to handle which is why we concentrated on CPU utilization.
As I said though you can trigger a watchdog reboot with a loop on any of our models which causes a broadcast packet storm which spike the CPU and the watchdog fails to report in time and you get a reboot.
We were discussing this right before I left the office today as an area to look at more closely this week.
The switch you sent us RMA ran perfectly for days and we could only cause a reboot by creating a loop.
The CPU is getting hit and causing watchdog to trigger, the question or mystery we need to figure out is WHY?
As I said there is no difference between ANY of our switch, same basic design just the number of ports change, and what type of power supply. Same firmware same hardware yet this is only affecting the WS-26-500-DC and as I said this is what we are trying to figure out WHY.
And why do some people, most people, not see the reboot but a few people do?
We have eliminated the power supply as being the cause. This is NOT power or grounding related.
This is why I was curious for you to substitute a WS-12-250-DC for one of the WS-26-500-DC. If it still occurred then we would know it's not the CPU utilization.
As I said we are also in the process of looking at the I2C cable at possibly being loose and needing reseated, and maybe in conjunction with CPU?
Could also be we need to limit broadcast packets allowed to hit the CPU or increase the watchdog timeout from 1 second to possibly 2 seconds simply because the WS-26-500-DC CPU is too busy keeping track of telemetry for so many ports and needs a longer time out trigger?
Trust me Mat we want to solve this as well but I do not have a crystal ball and sometimes it take a bit of time and a couple tries, v1.5.0rc1 was our first attempt and it would be helpful to know if it IMPROVES the situation if it does not completely solve it, this information lets us know if we are in the right direction.
UBNT has been working on their Flow Control Packet Storm bug on AF and AFX and airMAX AC radios for over a year now, the longest it has taken us to solve an issue like this is a fraction of that time.
Anyway look forward to looking at your switches tomorrow.
PM me your cell and time to call after lunch EST, best is later in my afternoon if possible.
I want to Teamview into your machine, then login to the switches and check some things.
Your log indicates WATCHDOG is initiating the reboots which means it is not hardware but rather software. It "could" be a lose I2C cable which is one thing I want to look at tomorrow, I can tell from looking at the unit UI.
Another possible cause is a broadcast storm which I said will trigger the WATCHDOG to reboot the switch.
I need you to install v1.5.0rc1 on all units and see if it "improves" the situation because there is no difference between a WS-12-250-DC and a WS-26-500-DC other than more ports and it has (2) 250W power supplies working in tandem and this is not an issue with the WS-12-250-DC or any other model.
The WS-26-500-DC runs a higher CPU utilization than a WS-12-250-DC because more ports and more telemetry to handle which is why we concentrated on CPU utilization.
As I said though you can trigger a watchdog reboot with a loop on any of our models which causes a broadcast packet storm which spike the CPU and the watchdog fails to report in time and you get a reboot.
We were discussing this right before I left the office today as an area to look at more closely this week.
The switch you sent us RMA ran perfectly for days and we could only cause a reboot by creating a loop.
The CPU is getting hit and causing watchdog to trigger, the question or mystery we need to figure out is WHY?
As I said there is no difference between ANY of our switch, same basic design just the number of ports change, and what type of power supply. Same firmware same hardware yet this is only affecting the WS-26-500-DC and as I said this is what we are trying to figure out WHY.
And why do some people, most people, not see the reboot but a few people do?
We have eliminated the power supply as being the cause. This is NOT power or grounding related.
This is why I was curious for you to substitute a WS-12-250-DC for one of the WS-26-500-DC. If it still occurred then we would know it's not the CPU utilization.
As I said we are also in the process of looking at the I2C cable at possibly being loose and needing reseated, and maybe in conjunction with CPU?
Could also be we need to limit broadcast packets allowed to hit the CPU or increase the watchdog timeout from 1 second to possibly 2 seconds simply because the WS-26-500-DC CPU is too busy keeping track of telemetry for so many ports and needs a longer time out trigger?
Trust me Mat we want to solve this as well but I do not have a crystal ball and sometimes it take a bit of time and a couple tries, v1.5.0rc1 was our first attempt and it would be helpful to know if it IMPROVES the situation if it does not completely solve it, this information lets us know if we are in the right direction.
UBNT has been working on their Flow Control Packet Storm bug on AF and AFX and airMAX AC radios for over a year now, the longest it has taken us to solve an issue like this is a fraction of that time.
Anyway look forward to looking at your switches tomorrow.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Who is online
Users browsing this forum: Google [Bot] and 44 guests