Adair I am taking it seriously but how would you suggest we figure it out?
If the switch is running fine now but locks up at some undetermined time if we look at it right now all is good and there is no way to determine anything from a properly functioning switch.
I have suggested the following:
First I would upgrade to v1.4.0rc12 and see if it happens again as there were changes that could affect large flat segments and fixes to UBNT Discovery and MAC tables.
If that does not help swap out the switch with a different one and see if the problem persists as this would indicate if it is a bad unit if the replacement unit does not do it.
If it still locks up after swapping then we need to look at your config, look at grounding, look at power and try and narrow it down.
I do not have any magic way to figure it out.
I have also explained that the only time I have ever run across a switch that locks up TIGHT was due to a power and grounding issue which I linked a post from last year above.
I am sure that a normal user would have blamed the switch with my problem but in the end it was grounding and a wire running between buildings/electric services.
I simply said I do not think your assessment of bandwidth having anything to do with it was probably not a factor.
You say you have 40 switches in service, are they ALL doing this? If not then it has to be something environmental with these 2 location or something differnt in the network configuration / type of traffic but I doubt this. OR a bad unit which can be tested by swapping it out.
When this happens have you attempted to use your console cable to verify it is a hard lock? If it is a hard lock the only thing that can cause that is an electrical issue or a bad unit. If you swap it out and it persists then we know it is not a bad unit.
Sometimes being a network technician is like being a detective.
But lets look at what we do know:
We know the switches can pass many GB of traffic so it is not a capacity issue.
There are 12K switches out there and a "couple" people having a "similar" problem, if this was a firmware issue or hardware design flaw everyone would have an issue not a couple people.
You yourself have 40+ switches yet only a problem with 2 locations.
We would love to help and or fix this but we need to narrow it down.
Dropping ports on new WS, what is wrong with my setup?
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
Another thing I have seen in the past is a corrupt config file that causes a switch to lock up. In fact I just ran into this problem this past weekend.
I have (2) WS-24-400A units in my NOC which I upgrade all the time to test RC code. Apparently during one of the upgrades the 2nd switch got a corrupted config.
Took me all morning Sunday to figure it out. Whatever it was doing also brought the network down in my office/NOC which I assume had something to so with the fact that the 2 switches have an LACP LAG between them and a STATIC LAG to the router and a loop was forming???
If I rebooted the one switch with the corrupted config the network would come up but would eventually lock up. After many hours of investigating I had noticed that the switch was not fully configured (The LACP LAG would not go active) and RSTP was preventing the loop. Now what would cause it to lock up hours later I have no idea. But ultimately I noticed that if I attempted any change at all to the switch with the corrupted config even some minor thing like a Port Description change the network would crash and I could no longer get into the switch UI/CLI.
I did a factory default on the switch in question where I hold in the default button for 20 seconds while powering it up which you then have to let set for several minutes as it re-formats part of the flash chip, then I manually set the switch back up and the problem went away.
Now what caused the config to get corrupted I have no idea, shit happens sometimes with any piece of electronic equipment so unless I see a pattern or many reports of this I will chalk it up to bad luck. So that is another possible thing to try, factory default and manually set back up. But this is not a unique tech procedure to Netonix equipment as I have had to do that in the past to all sorts of computers and network equipment. DO NOT EXPORT AND IMPORT AS CONFIG MAY BE CORRUPTED?
I have (2) WS-24-400A units in my NOC which I upgrade all the time to test RC code. Apparently during one of the upgrades the 2nd switch got a corrupted config.
Took me all morning Sunday to figure it out. Whatever it was doing also brought the network down in my office/NOC which I assume had something to so with the fact that the 2 switches have an LACP LAG between them and a STATIC LAG to the router and a loop was forming???
If I rebooted the one switch with the corrupted config the network would come up but would eventually lock up. After many hours of investigating I had noticed that the switch was not fully configured (The LACP LAG would not go active) and RSTP was preventing the loop. Now what would cause it to lock up hours later I have no idea. But ultimately I noticed that if I attempted any change at all to the switch with the corrupted config even some minor thing like a Port Description change the network would crash and I could no longer get into the switch UI/CLI.
I did a factory default on the switch in question where I hold in the default button for 20 seconds while powering it up which you then have to let set for several minutes as it re-formats part of the flash chip, then I manually set the switch back up and the problem went away.
Now what caused the config to get corrupted I have no idea, shit happens sometimes with any piece of electronic equipment so unless I see a pattern or many reports of this I will chalk it up to bad luck. So that is another possible thing to try, factory default and manually set back up. But this is not a unique tech procedure to Netonix equipment as I have had to do that in the past to all sorts of computers and network equipment. DO NOT EXPORT AND IMPORT AS CONFIG MAY BE CORRUPTED?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
Please post up a screencap of your Main Status Tab Adair.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
TheHox - Experienced Member
- Posts: 107
- Joined: Sat Sep 13, 2014 10:59 am
- Location: WI
- Has thanked: 11 times
- Been thanked: 18 times
Re: Dropping ports on new WS, what is wrong with my setup?
My issues started when we went to an MDU that we were upgrading the WiFI and using Netonix switches to power the APs.
We plugged 2 netonix's into a Netgear ProSafe that was onsite, and used the netonix to power 18 UniFi APs. Everything was working fine, then we left that afternoon and about an hour later got a call the internet was not working. I was not able to get to the switches, and the Unifi controller showed all my AP's offline.
I sent a tech to power cycle the Netonix's and all was good again.
Another 1.5 hr later it happened again, I had a tech again reboot, and at that time had him plug a patch cable directly from one of the switches to the router to a port I disabled remotely(to attempt to see what was going on later)
Like clock work, another hour later, it happened a 3rd time, I was able to login to the router and enable that port we just plugged the patch cable to, I was able to get to the switch to see the uplink port flapping on/off, which is the log file I made on the original post. I went onsite and I moved the 2 netonix's directly to the router and bypassed the ProSafe and those issues seemed to go away.
After that, we then noticed other issues,
A 2nd issue I had, was, the Netonix's also power some 5 port switches in each of the 18 units in the MDU, some of them were doing a constant stream of data, like 16kpps solid Ports 2, 6 and 8 in the attached image show the flood. A power cycled fixed it. We have since vlan'd each of the units off.
We have had the NetGear Prosafe running in this MDU for over a year just fine, but the Netonix had some issues at first getting going, no loops as we didn't change any of the wiring just swapped out switches.
Running 1.3.9, we have about 20 switches across our wisp that usually are fine, but something really weird going on here.
We plugged 2 netonix's into a Netgear ProSafe that was onsite, and used the netonix to power 18 UniFi APs. Everything was working fine, then we left that afternoon and about an hour later got a call the internet was not working. I was not able to get to the switches, and the Unifi controller showed all my AP's offline.
I sent a tech to power cycle the Netonix's and all was good again.
Another 1.5 hr later it happened again, I had a tech again reboot, and at that time had him plug a patch cable directly from one of the switches to the router to a port I disabled remotely(to attempt to see what was going on later)
Like clock work, another hour later, it happened a 3rd time, I was able to login to the router and enable that port we just plugged the patch cable to, I was able to get to the switch to see the uplink port flapping on/off, which is the log file I made on the original post. I went onsite and I moved the 2 netonix's directly to the router and bypassed the ProSafe and those issues seemed to go away.
After that, we then noticed other issues,
A 2nd issue I had, was, the Netonix's also power some 5 port switches in each of the 18 units in the MDU, some of them were doing a constant stream of data, like 16kpps solid Ports 2, 6 and 8 in the attached image show the flood. A power cycled fixed it. We have since vlan'd each of the units off.
We have had the NetGear Prosafe running in this MDU for over a year just fine, but the Netonix had some issues at first getting going, no loops as we didn't change any of the wiring just swapped out switches.
Running 1.3.9, we have about 20 switches across our wisp that usually are fine, but something really weird going on here.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
TheHox wrote:I moved the 2 netonix's directly to the router and bypassed the ProSafe and those issues seemed to go away.
Not sure what to say about this? Is the problem the Netonix or the ProSafe or they just do not like playing together? If you're really curious and you can cause it to happen like clockwork every hour being on site with a console cable if needed and examining the switch logs would be a good start. I would also try this configuration with v1.4.0rc12 for reasons stated.
TheHox wrote:After that, we then noticed other issues,
A 2nd issue I had, was, the Netonix's also power some 5 port switches in each of the 18 units in the MDU, some of them were doing a constant stream of data, like 16kpps solid Ports 2, 6 and 8 in the attached image show the flood. A power cycled fixed it. We have since vlan'd each of the units off.
We have had the NetGear Prosafe running in this MDU for over a year just fine, but the Netonix had some issues at first getting going, no loops as we didn't change any of the wiring just swapped out switches.
When you say you are powering 5 port switches with the the WS-24-400A what switches would they be?
As far as the constant stream of data this could be the issue with UBNT Discovery that was fixed in v1.4.0rcX but being on site with WireShark could easily determine what this data stream is using port mirror to your laptop.
TheHox wrote:Running 1.3.9, we have about 20 switches across our wisp that usually are fine, but something really weird going on here.
I would suggest trying v1.4.0rc12 as there were some fixes with large flat networks which this is sort of.
Now an offhanded suggestion as you never know about pesky wannabe hackers:
Are these switches UI/CLI accessible from the apartments? If so you might want to consider using the Access control list in the switch to block access to the switch UI/CLI
But as I said I would try v1.4.0rcX.
If that does not help I would do as I mentioned above and go on site and recreate the issue and investigate what's going on especially is you can recreate it in about an hour.
Have diagnostic equipment on hand when on site:
Laptop with WireShark
Console cable for switches
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
WisTech - Associate
- Posts: 213
- Joined: Mon Aug 04, 2014 3:57 pm
- Has thanked: 8 times
- Been thanked: 64 times
Re: Dropping ports on new WS, what is wrong with my setup?
Your setup is super simple too, that's what doesn't make any sense. I was running RSTP on two ports on a local end 6-mini powering a pair of 5Xs on an NxN, as well as a remote 6-mini powering a pair of 5Xs on an NxN. What's even more strange is suddently, I could not access the remote switch, or the remote ips linked up, even if I powered down one of the ports here locally. I powered it back on and planned to power cycle the remote 6-mini which would clear whatever glitch we had. So, at 11:45PM, it created a loop with another local port on the 6-mini and took down an AP, as well as another 5X link powered from a 24 port netonix below that is what is also powering up the local 6-mini on the roof.
24 port powers local 6-mini on the roof that powers a 500AC AP and twin 5Xs with RSTP enabled on both ends (LAG disabled)
24 port also powers multiple 5X radios that are not connecting various segments. It's a hard thing to reproduce, but why in the hell would it happen suddenly at 11:45 when no one was around, and no traffic on the remote of the NxN setup.
24 port powers local 6-mini on the roof that powers a 500AC AP and twin 5Xs with RSTP enabled on both ends (LAG disabled)
24 port also powers multiple 5X radios that are not connecting various segments. It's a hard thing to reproduce, but why in the hell would it happen suddenly at 11:45 when no one was around, and no traffic on the remote of the NxN setup.
-
TheHox - Experienced Member
- Posts: 107
- Joined: Sat Sep 13, 2014 10:59 am
- Location: WI
- Has thanked: 11 times
- Been thanked: 18 times
Re: Dropping ports on new WS, what is wrong with my setup?
sirhc wrote:TheHox wrote:I moved the 2 netonix's directly to the router and bypassed the ProSafe and those issues seemed to go away.
Not sure what to say about this? Is the problem the Netonix or the ProSafe or they just do not like playing together? If you're really curious and you can cause it to happen like clockwork every hour being on site with a console cable if needed and examining the switch logs would be a good start. I would also try this configuration with v1.4.0rc12 for reasons stated.TheHox wrote:After that, we then noticed other issues,
A 2nd issue I had, was, the Netonix's also power some 5 port switches in each of the 18 units in the MDU, some of them were doing a constant stream of data, like 16kpps solid Ports 2, 6 and 8 in the attached image show the flood. A power cycled fixed it. We have since vlan'd each of the units off.
We have had the NetGear Prosafe running in this MDU for over a year just fine, but the Netonix had some issues at first getting going, no loops as we didn't change any of the wiring just swapped out switches.
When you say you are powering 5 port switches with the the WS-24-400A what switches would they be?
As far as the constant stream of data this could be the issue with UBNT Discovery that was fixed in v1.4.0rcX but being on site with WireShark could easily determine what this data stream is using port mirror to your laptop.TheHox wrote:Running 1.3.9, we have about 20 switches across our wisp that usually are fine, but something really weird going on here.
I would suggest trying v1.4.0rc12 as there were some fixes with large flat networks which this is sort of.
Now an offhanded suggestion as you never know about pesky wannabe hackers:
Are these switches UI/CLI accessible from the apartments? If so you might want to consider using the Access control list in the switch to block access to the switch UI/CLI
But as I said I would try v1.4.0rcX.
If that does not help I would do as I mentioned above and go on site and recreate the issue and investigate what's going on especially is you can recreate it in about an hour.
Have diagnostic equipment on hand when on site:
Laptop with WireShark
Console cable for switches
We yanked the prosafe out once we had the migration completed. I just had another unifi AP go into isolated state, bounced the port, had no effect. rebooted the switch, and it came back up now. Still on 1.3.9 if it happens again I'll upgrade.
-
adairw - Associate
- Posts: 465
- Joined: Wed Nov 05, 2014 11:47 pm
- Location: Amarillo, TX
- Has thanked: 98 times
- Been thanked: 132 times
Re: Dropping ports on new WS, what is wrong with my setup?
I've seen that 16K pps and 8Mb stream of traffic start just before the switch locks up. If I can disable to port fast enough it wont lock the switch up. I'll post screen shots later
-
WisTech - Associate
- Posts: 213
- Joined: Mon Aug 04, 2014 3:57 pm
- Has thanked: 8 times
- Been thanked: 64 times
Re: Dropping ports on new WS, what is wrong with my setup?
adairw wrote:I've seen that 16K pps and 8Mb stream of traffic start just before the switch locks up. If I can disable to port fast enough it wont lock the switch up. I'll post screen shots later
Same here. I actually was able to get into my switch up on the roof and saw the one port that was dead as a doornail pegged at 16kpps and ~8Mbps.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
I am really curious to see if v1.4.0rc12 fixes this.
Please let me know.
If v1.4.0rc12 does not fix it try disabling every service not needed under the Device/Configuration Tab such as:
IGMP Snooping
Discovery
UBNT Discovery
LLDP
Cisco Discovery
Everything then if that fixes it enable 1 at a time until you find it.
But I am hoping v1.4.0rc12 fixes you up.
Please let me know.
If v1.4.0rc12 does not fix it try disabling every service not needed under the Device/Configuration Tab such as:
IGMP Snooping
Discovery
UBNT Discovery
LLDP
Cisco Discovery
Everything then if that fixes it enable 1 at a time until you find it.
But I am hoping v1.4.0rc12 fixes you up.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Who is online
Users browsing this forum: No registered users and 17 guests