HI Folks,
I have two WS-12-250-DC switches on a tower. Both switches are in the same equipment box and have been running flawlessly since installed.
On Sunday (4/9) around 3pm I lost all access to the tower. (My first tower so I don't have redundant paths yet...) I located the problem to the Netonix switches but decided to power cycle them before isolating things further.
Yesterday around 6pm the same thing happened again but I took the time to isolate it to one switch. (I plugged into a spare port and could not ping a thing including the switch. I rushed there and didn't have a console cable with me... otherwise I would have checked that way as well.)
All radios are connected to routers running OSPF, and all routers are connected to both switches for redundancy.
The AF24 back haul to the data center is powered from one of the switches and has a dedicated vlan to its router.
Flow control is turned off on all AirFibers. The Netonix switches have negotiated full FC with the ERXs.
The switches were running 1.4.5rc2 which was the latest when I installed the switches last Oct. Yesterday, I upgraded both to 1.4.6.
Networking monitoring does not show anything strange just before the switch becomes non-responsive.
The equipment box housing the switches is in the shade for most of the day, though they catch some sun in the mid to late afternoon. The equip boxes are ventilated and have thermostat controlled fans. (I did test the fans yesterday but am wondering if the thermostat was sticky and didn't not kick in over the last few days. Needless to say I lowered the set point.)
I'm bringing up the fans because I'm thinking about the device temps given the charts below.
Switch 1 recent history (this is the switch that become non-responsive)
Switch 2 recent history
Switch 1 full history
Switch 2 full history
What I find interesting is that the temp profile for the two switches is so different. (Note that switch 1 powers the AF24, while switch 2 powers a AF5x.)
For instance the PHY Temp and the DCDC Control Temp.
I've ordered a new switch and will install it Sat morning.
In the mean time, a couple of questions:
1) What should I check if this happens again?
2) What are the odds this is temp related?
3) What else should I consider.
Thanks
Mark
Non-responsive WS-12-250-DC
- Julian
Re: Non-responsive WS-12-250-DC
Hi Mark.
1.4.6 is a couple of months old at this point; not sure that it directly affects your issue, however the current version is rc18, would recommend you upgrade.
Temps appear to be within nominal ranges; per my conversation with our engineer this morning, temps under 90 are okay, with respect to the switch board's continuing function.
a lot of these random lockups are linked to corrupt firmware images, in your case I can't say.
thanks,
Julian
1.4.6 is a couple of months old at this point; not sure that it directly affects your issue, however the current version is rc18, would recommend you upgrade.
Temps appear to be within nominal ranges; per my conversation with our engineer this morning, temps under 90 are okay, with respect to the switch board's continuing function.
a lot of these random lockups are linked to corrupt firmware images, in your case I can't say.
thanks,
Julian
Re: Non-responsive WS-12-250-DC
OK. I'll upgrade to the latest release candidate.
By "corrupt firmware images" do you mean bugs in the firmware or that the copy of the firmware installed in the switch is corrupt?
By "corrupt firmware images" do you mean bugs in the firmware or that the copy of the firmware installed in the switch is corrupt?
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Non-responsive WS-12-250-DC
I would upgrade the firmware BUT if this happens again BEFORE you reboot the switch unplug the cable from the switch to the router and then plug them back in.
I highly doubt this is a firmware issue or a switch lockup.
Also please post up "all" your switch TABs so I can see your configuration
I highly doubt this is a firmware issue or a switch lockup.
Also please post up "all" your switch TABs so I can see your configuration
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Re: Non-responsive WS-12-250-DC
Screen shots of the tabs are below.
However, a bit of explanation is likely required. My config is based on Jim McNally's design for redundant routers. My initial implementation is described here.
In short
Jim's design requires that all of the PTP and BCAST routers are interlinked on the switch domain subnets. This works fine when the number of ERXs are few, but gets somewhat messy and the number of ERXs increases.
My current implementation uses the WS-12-250-DCs to primarily support switch domains. Simpler and cleaner physical cabling. Each PTP router has one cable to each switch and a third to the radio.
This describes the port usage of both Netonix switches.
(Note, all IP addresses listed are on router ports except for the switch IPs: 10.1.1.254 & 10.1.2.254)
If a radio only needs 24V then the radio is powered via the POE passthru of the ERX.
However, when a radio needs either 24VH or 48VH then the physical connection is ERX -> Netonix -> Radio data/poe port. And in this case the Netonix ports have a dedicated vlan for the PTP link.
With that out of the way, here are the switch tabs for switch 1. Switch 2 is nearly the same except for PTP link names and minor port differences.
(Regretting that I never created a dedicated switch domain vlan...]
All other tabs have defaults.
---
This has been working wonderfully well until this past Sunday and then again on Tuesday. On Tuesday the the same thing occurred while I was still at the tower about 40 minutes after I initially power cycled the switches. All has been stable since.
While I am continually adding new subscribers no other major changes have been made recently. Also, the network monitoring does not show anything out of the ordinary and traffic levels at the time of the outages was well below peak rates.
I would love to hear theories or configuration suggestions, but I"m also prepping to replace the switch Sat morning.
However, a bit of explanation is likely required. My config is based on Jim McNally's design for redundant routers. My initial implementation is described here.
In short
* Every PTP link gets it's own subnet and is physically connected to a dedicated ERX
* Each PTP ERX is connected to a) the PTP radio and b) two different subnets referred as switch domains. 10.1.1.0/24 and 10.1.2.0/24 in my case.
* All broadcast PTMP APs are connected to a different set of dedicated broadcast (BCAST) routers using 10.1.0.0/24.
Jim's design requires that all of the PTP and BCAST routers are interlinked on the switch domain subnets. This works fine when the number of ERXs are few, but gets somewhat messy and the number of ERXs increases.
My current implementation uses the WS-12-250-DCs to primarily support switch domains. Simpler and cleaner physical cabling. Each PTP router has one cable to each switch and a third to the radio.
This describes the port usage of both Netonix switches.
(Note, all IP addresses listed are on router ports except for the switch IPs: 10.1.1.254 & 10.1.2.254)
If a radio only needs 24V then the radio is powered via the POE passthru of the ERX.
However, when a radio needs either 24VH or 48VH then the physical connection is ERX -> Netonix -> Radio data/poe port. And in this case the Netonix ports have a dedicated vlan for the PTP link.
With that out of the way, here are the switch tabs for switch 1. Switch 2 is nearly the same except for PTP link names and minor port differences.
(Regretting that I never created a dedicated switch domain vlan...]
All other tabs have defaults.
---
This has been working wonderfully well until this past Sunday and then again on Tuesday. On Tuesday the the same thing occurred while I was still at the tower about 40 minutes after I initially power cycled the switches. All has been stable since.
While I am continually adding new subscribers no other major changes have been made recently. Also, the network monitoring does not show anything out of the ordinary and traffic levels at the time of the outages was well below peak rates.
I would love to hear theories or configuration suggestions, but I"m also prepping to replace the switch Sat morning.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Non-responsive WS-12-250-DC
Well me personally before I rebooted anything I would have connected to the switch with a console cable and saw what was going on. I HIGHLY doubt the switch was locked up.
Also I would have plugged my laptop into an open port such as port 12 with an IP that allowed me to talk directly to the switch so I could gain access to the UI which is easier to see whats up.
To test for a packet lock such as that created from Flow Control bugs I would have unplugged and plugged back in 1 port at a time to see if everything started working again.
Look I am not saying this is not a defective switch, we do find some here and there but you have not done enough to be able to point the finger at that.
Now if you're swapping it out with an identical unit to see if the problem still exists which would indicate it is not a hardware issue with the switch that makes sense.
Also as we discussed you should be running v1.4.7rc18
As far as creating mid-span injectors with VLANs like your VLAN 100 above that is exactly what we do at my WISP - works great.
Also I would have plugged my laptop into an open port such as port 12 with an IP that allowed me to talk directly to the switch so I could gain access to the UI which is easier to see whats up.
To test for a packet lock such as that created from Flow Control bugs I would have unplugged and plugged back in 1 port at a time to see if everything started working again.
Look I am not saying this is not a defective switch, we do find some here and there but you have not done enough to be able to point the finger at that.
Now if you're swapping it out with an identical unit to see if the problem still exists which would indicate it is not a hardware issue with the switch that makes sense.
Also as we discussed you should be running v1.4.7rc18
As far as creating mid-span injectors with VLANs like your VLAN 100 above that is exactly what we do at my WISP - works great.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Re: Non-responsive WS-12-250-DC
Also I would have plugged my laptop into an open port such as port 12 with an IP that allowed me to talk directly to the switch so I could gain access to the UI which is easier to see whats up.
Did this the first and second times things went sideways. (Didn't have a serial console cable with me. Doh!)
To test for a packet lock such as that created from Flow Control bugs I would have unplugged and plugged back in 1 port at a time to see if everything started working again.
Flow control is disabled on all radios (I've rechecked) and is enabled on all routers.
About an hour ago the problem occurred again. This time I connected to the switch could not ping it, but left ping running as I disconnected and reconnected the routers one by one. This recovered the situation.
I initially unplugged the routers that are powered by the other switch. This did not seem to correct the issue.
Then I unplugged the routers that are powered by the switch. This did correct the problem.
So it looks like there is something going on the network side of things and not a problem with the switch.
Nothing unusual is in the logs of the switches or the ERX that would point to a problem.
Unfortunately, at the moment I am at a loss for what could be causing the problems.
Storm control is disabled on both switches. The ERXs are running 1.9.0
Now if you're swapping it out with an identical unit to see if the problem still exists which would indicate it is not a hardware issue with the switch that makes sense.
At this point I would like to try and resolve this from a network admin perspective, but will likely swap in the new switch if there are no obvious config things to try.
Also as we discussed you should be running v1.4.7rc18
Will upgrade later today.
As far as creating mid-span injectors with VLANs like your VLAN 100 above that is exactly what we do at my WISP - works great.
I've watched the movies and read the forum. The mid-span injector and vlan are in place because I've learned about it here.
---
A couple of additional notes for background.
- * Fully routed network
* Subscribers receive a public IP
* Firewalled on both head end and at CPEs
* Subscriber CPEs block all subscriber traffic destined for the CPE mgmt ports and private nets (10/8, 176.16/12 192.168/16)
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Non-responsive WS-12-250-DC
I would disable Flow Control on the ports facing the router that unplugging fixed the problem.
What you have here is a packet lock.
I think you said these were UBNT Routers?
Where I love UBNT gear I have very little faith in their implementation of Flow Control.
Flow Control on their Switches like Edge Switch and UniFi Switch are fine as they use Switch Cores and Flow Control is not written by them but is written by Broadcom just like we do not write our Flow Control it is handled by our switch core and is written by Vitesse. We only turn it on and off.
My personal opinion is that swapping the switch will do nothing, I really do NOT think this is a switch problem.
I would examine the Port Stats on the ports facing the router in question.
Also the fact that you are powering the routers from the switch is NOT the issue.
Also Turn on Pause Frame Storm Protection on the device configuration Tab, it is STUPID to ever turn this OFF as it "can" sometimes detect a Pause Frame Storm and disable Flow Control which is the same as unplugging the cable.
CLICK IMAGE BELOW TO VIEW FULL SIZE
What you have here is a packet lock.
I think you said these were UBNT Routers?
Where I love UBNT gear I have very little faith in their implementation of Flow Control.
Flow Control on their Switches like Edge Switch and UniFi Switch are fine as they use Switch Cores and Flow Control is not written by them but is written by Broadcom just like we do not write our Flow Control it is handled by our switch core and is written by Vitesse. We only turn it on and off.
My personal opinion is that swapping the switch will do nothing, I really do NOT think this is a switch problem.
I would examine the Port Stats on the ports facing the router in question.
Also the fact that you are powering the routers from the switch is NOT the issue.
Also Turn on Pause Frame Storm Protection on the device configuration Tab, it is STUPID to ever turn this OFF as it "can" sometimes detect a Pause Frame Storm and disable Flow Control which is the same as unplugging the cable.
CLICK IMAGE BELOW TO VIEW FULL SIZE
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Non-responsive WS-12-250-DC
You can also create a Watchdog rule to ping the router connected to the switch and if it fails to ping it the switch can either power cycle it or simply drop the Ethernet link and re-establish the link just like unplugging it.
Read this post: viewtopic.php?f=6&t=2662#p18588
Read this post: viewtopic.php?f=6&t=2662#p18588
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Who is online
Users browsing this forum: Google [Bot] and 36 guests