Page 1 of 2

WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 4:30 pm
by FrankPetrilli
Two WS-6-MINIs deployed in a NEMA enclosure at a tower site. Powered via 50V AirFiber injector. Second switch initially had issues with STP setting all ports to blocking (that thread was created by my colleague) and this seems to have not happened again.

However, the first switch has failed every single day since install. PoE remains to all output ports, but the device seems to go into a bootloop of some kind. Nothing is in the logs, no issues are seen when it is running fine etc etc. However, when it fails, we can see ports go link-layer down, coming back up for one second every few minutes.

Here are the relevant logs from the fine switch next to it; the inter-switch-link is on port 6:
Code: Select all
Aug 1 12:01:13 kernel: link state changed to 'down' on port 6
Aug 1 12:01:26 kernel: link state changed to 'up' on port 6
Aug 1 12:01:27 kernel: link state changed to 'down' on port 6
Aug 1 12:02:21 kernel: link state changed to 'up' on port 6
Aug 1 12:02:22 kernel: link state changed to 'down' on port 6
Aug 1 12:24:41 kernel: link state changed to 'up' on port 6
Aug 1 12:24:42 kernel: link state changed to 'down' on port 6
Aug 1 12:33:33 kernel: link state changed to 'up' on port 6
Aug 1 12:33:35 kernel: link state changed to 'down' on port 6
Aug 1 12:57:41 kernel: link state changed to 'up' on port 6
Aug 1 12:57:43 kernel: link state changed to 'down' on port 6
Aug 1 12:58:57 kernel: link state changed to 'up' on port 6
Aug 1 12:58:59 kernel: link state changed to 'down' on port 6
Aug 1 13:24:22 kernel: link state changed to 'up' on port 6
Aug 1 13:24:23 kernel: link state changed to 'down' on port 6

This switch has been taking out an entire tower for a week and support threads have gone unanswered. I'd like to get in contact with Netonix directly regarding this issue if possible; I'd like to replace these with a WS-12-250-DC ASAP.

Thank you,
Frank

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 4:33 pm
by sirhc
Hello Frank,

What firmware are you running?

There are some issues with older firmware on large flat network segments with STP.

Also on large flat network some people had issue with Discovery protocols enables. (You could try disabling those)

There is also a known bug if you have SMTP enabled for alerts and you have communications issues with your mail server or your mail server is slow it would take up the CPU time and cause issues with STP.

Most of these issues have been addressed in in v1.4.3rc4

Could you possibly try v1.4.3rc4?

We are still tracking down an issue from a customer in Kenya that "so far" seems isolated to him and his configuration and when we fix his issue there will be a v1.4.3rc5

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 4:48 pm
by FrankPetrilli
Firmware is 1.4.3rc3, though this occurred on 1.4.2 and 1.3.8.

I've been reading the forums for the past week. All of this has been tried:

The network segments are incredibly small; my network is designed as fully-routed as possible. RSTP has been disabled since day 2, as I'm 100% confident loops are not going to occur.

All discovery protocols are disabled.

SMTP is not enabled.

When I can get someone out to the site to pull power on the switch and reboot it, I'll attempt an install of 1.4.3rc4, but I'm hoping to just call it quits with this WS-6-MINI model. We've had incredible results from the WS-12-250-DC model at other sites, and I'd much rather have that device in place.

Thank you,
Frank

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 5:02 pm
by sirhc
If you are having this issue with the WS-6-MINI then you will have the exact same issue with any of our switches as all out switches use the EXACT SAME SWITCH CORE.

You literally could de-solder the core from the WS-6-MINI and put it on the WS-24-400A and it would work.

All of our switches uses the EXACT SAME FIRMWARE.

So if you're having an issue here it is not model related as all our switches are the same from the WS-6 to the WS-24.

There has to be something going on at this site differnt than the other sites.

I know it is a pain and maybe you did it before but there are so many posts so many threads and I have a hard time keeping them all straight.

At this point you have done the right thing and started a thread for your issue.

Please explain again what happens to the switch at this site.

Please post up screen grabs of all your Tabs.

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 5:17 pm
by FrankPetrilli
I'm aware the switch core is the same across all devices, as is your codebase. However, your boards are different, power supplies are different, and thermal properties are different, no? This is a smaller board. Smaller boards have shorter traces. Maybe there's a timing error on this board that wouldn't occur in the larger switches. Maybe the thermal dissipation from caps in the PSU is different since it's a smaller, fanless case, and a spike in AirFiber required amperage would cause the CPU / Switch ASIC to drop some voltage and die. I'm an Electrical, Software, and Network Engineer. Saying that the switch core is homogenous means nothing when issues could come from a thousand different places.

I understand that you're swamped with support requests, and I should have made it more clear that I've performed all steps before creating this thread.

At random intervals, SW1 on this tower will do the following:

Show link state down (For example, running ethtool on a connected Linux device will show Link connected: no)
Retain PoE power to all connected devices
Periodically reappear to connected devices as link up for one second, as shown in the log file posted above.
Remain in this state until power is pulled, then fall into it at a random time later.

The switch has now fallen into this state again after an attempted firmware upgrade. Unfortunately, the device won't stay alive long enough for me to post any screenshots.

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 5:29 pm
by Dave
Frank

This our best selling switch model, and many, many thousands have been sold & used world wide. We have yet to have a case
where their has been a discovered design flaw with it. Usually the few & far between issues with the 6 port mini often relate to inadequate power sources
to it, or someone winds up having a bad radio that is causing issues.

Would be curious to know what you are powering in regards to how much POE load wattage load you are using, etc.

There is also the slim chance that you have a bad unit also..our % failure rate in the field for this model is very low, but it does happen.

Dave

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 5:36 pm
by FrankPetrilli
Here are the images.
Screen Shot 2016-08-01 at 2.25.14 PM.png
Screen Shot 2016-08-01 at 2.24.59 PM.png
Screen Shot 2016-08-01 at 2.24.46 PM.png
Screen Shot 2016-08-01 at 2.24.25 PM.png

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 5:42 pm
by Dave
Frank

This thing is running at a low board temp for this design, so issue is not over heating for sure, and your POE loads are minimal.

Something else must be going on.

I will let Chris comment later when he gets a chance.

Good to know you like our DC model as we are always looking for feedback, good or bad, because we always try & design in
any improvements or enhancements for next generation products.

Dave

Re: WS-6-MINI has failed every day this week

Posted: Mon Aug 01, 2016 6:30 pm
by sirhc
OK so your snip from the log below is indicating that the port 6 link which is not even using POE is going up and down.

Where does port 6 go?

Code: Select all
Aug 1 12:01:13 kernel: link state changed to 'down' on port 6
Aug 1 12:01:26 kernel: link state changed to 'up' on port 6
Aug 1 12:01:27 kernel: link state changed to 'down' on port 6
Aug 1 12:02:21 kernel: link state changed to 'up' on port 6
Aug 1 12:02:22 kernel: link state changed to 'down' on port 6
Aug 1 12:24:41 kernel: link state changed to 'up' on port 6
Aug 1 12:24:42 kernel: link state changed to 'down' on port 6
Aug 1 12:33:33 kernel: link state changed to 'up' on port 6
Aug 1 12:33:35 kernel: link state changed to 'down' on port 6
Aug 1 12:57:41 kernel: link state changed to 'up' on port 6
Aug 1 12:57:43 kernel: link state changed to 'down' on port 6
Aug 1 12:58:57 kernel: link state changed to 'up' on port 6
Aug 1 12:58:59 kernel: link state changed to 'down' on port 6
Aug 1 13:24:22 kernel: link state changed to 'up' on port 6
Aug 1 13:24:23 kernel: link state changed to 'down' on port 6


The above log indicates that the Ethernet link for port 6 is going up and down?

Please describe this site:
1) What is this tower, please explain what it is?

2) Where is the WS-6-MINI located?

3) What is in port 6 and where is that device located?
Have you looked into the possibility that the device on port 6 is not at fault or possibly the cabling?

4) Where is the AF24 POE brick located and how much cable between the brick and the WS-6-MINI?

5) Where does the AC power for the POE brick come from?

6) I assume you are powering an AFX radio in port 2? If so why are you not using 48VH?
Read this post: viewtopic.php?f=6&t=1215#p9040
So long as the AFX radio is a newer one it is better to power with 48VH as the WS-6-MINI would not have to down convert voltage which would mean less loss, greater efficiency, and less heat.

Re: WS-6-MINI has failed every day this week

Posted: Tue Aug 02, 2016 12:13 am
by FrankPetrilli
From above:

Here are the relevant logs from the fine switch next to it; the inter-switch-link is on port 6:


1. The tower is a small PoP. 1x RB2011, 2x Rocket AC PTMP, 1x NetMetal 5ac, 1x AirFiber 5X, 1x PowerBeam 5ac
2. Described above, port 6 is a trunk between two WS-6-MINIs (SW1/SW2). SW1 is failing at random intervals. The data from port 6 is me showing what the failing switch looks like from SW2.
3. Port 6 on each switch is a link to the other switch's port 6. Directly next to it, connected with a 6" cable.
4. At about 15 meters, located inside the house adjacent. We've checked cable integrity and reterminated, swapped injectors, etc etc.
5. From a TrippLite UPS (sine wave)
6. The AF5X is old enough that it doesn't support 48VH.