Dropping ports on new WS, what is wrong with my setup?
- IntL-Daniel
- Experienced Member
- Posts: 170
- Joined: Mon Nov 02, 2015 5:07 pm
- Location: Czech Republic
- Has thanked: 7 times
- Been thanked: 9 times
Re: Dropping ports on new WS, what is wrong with my setup?
I bet for the issue of AF and it's LAN port. At least there is known issue of generating CRC errors by internal CPU to CPU communication on all AF units. There was an issue of non working FC during the whole beta/RC period just before the 3.2 release. So I think that finaly there remain something wrong about the FC. Of course I wish to be wrong.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Same here
Uintadave wrote:This also brings up watchdog... which does not start or work if you have NTP enabled and the NTP server is unreachable....
What makes you think Watch Dog does not work if no NTP server is reachable?
Watchdog in this case would not kick in as the switch is NOT locked up it is just blocking the ports from communicating as there are sop many Pause Frames and the buffers fill up and traffic grinds to a halt but the CPU is running fine and the core is running fine and they can talk to each other so watchdog is not going to reboot the switch.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
tma wrote:sirhc wrote:Now keep in mind Thomas one problem you were having at that time was with LAG and RSTP or in your case not wanting to use RSTP with LAG as it masked a problem and through your persistence you were able to prove that bug to us and Eric fixed it. Just make sure not confuse that problem with this problem, similar maybe, but not the same.
The problem that was fixed was another problem - it was easily reproducible from a simple recipe. The problem I had seen 4 times but then failed to reproduce was never solved (obviously). It looked very similar to the one in this thread: Same 8 Mbps / 15 Kpps symptom. But as said, similarity may end at this point, because in my case no AF was involved and no STP, just two Netonixes connected by one cable. The trigger may have come in from the port I was connected to, but even after I unconnected that port, they continued sending packets to each other - again: on an island of two Netonixes connected by one cable. (But when I connected to the other Nextonix, it suddenly stopped.)
Anyway, my (unfixed) problem did not reoccur since many months and because it could be that only the visible part looks alike, I'll leave it at that ... for the moment ;-)
Well for the record this problem ended up having nothing to do with RSTP, that was me jumping to conclusions, FALSE CONCLUSIONS as Eric pointed out to me today on Skype SEVERAL TIMES.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
Uintadave wrote:Also as a side note there was exactly 0 pps being transmitted to the AF5x
I know right, so why would the AF radio be telling the switch to slow down or Pause?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
IntL-Daniel wrote:I bet for the issue of AF and it's LAN port. At least there is known issue of generating CRC errors by internal CPU to CPU communication on all AF units. There was an issue of non working FC during the whole beta/RC period just before the 3.2 release. So I think that finaly there remain something wrong about the FC. Of course I wish to be wrong.
Yea I am getting real suspicious that this is not our problem but rather possibly a UBNT problem.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
Eric Stern - Employee
- Posts: 532
- Joined: Wed Apr 09, 2014 9:41 pm
- Location: Toronto, Ontario
- Has thanked: 0 time
- Been thanked: 130 times
Re: Dropping ports on new WS, what is wrong with my setup?
tma wrote:Okay, I see what you mean. So FC frames aren't switched by themselves but they may cause a domino effect. Would the switch consider these pause frames for the graph, though?
Yes, we do. Its possible Ubiquiti doesn't though. That would explain the situation someone described early where the switch showed a large amount of packets being received but the airFIBER didn't show any traffic being sent.
You could make a reasonable argument that pause frames should not be counted, as they are not traffic. However they are real frames that take up real bandwidth on the wire.
- IntL-Daniel
- Experienced Member
- Posts: 170
- Joined: Mon Nov 02, 2015 5:07 pm
- Location: Czech Republic
- Has thanked: 7 times
- Been thanked: 9 times
Re: Dropping ports on new WS, what is wrong with my setup?
sirhc wrote:IntL-Daniel wrote:I bet for the issue of AF and it's LAN port. At least there is known issue of generating CRC errors by internal CPU to CPU communication on all AF units. There was an issue of non working FC during the whole beta/RC period just before the 3.2 release. So I think that finaly there remain something wrong about the FC. Of course I wish to be wrong.
Yea I am getting real suspicious that this is not our problem but rather possibly a UBNT problem.
Unfotunately there is no way to discuss this with UBNT team leaded by UBNT-Chuck there. He constantly reject and close any discussion about CRC errors generated by AF units.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
IntL-Daniel wrote:Unfortunately there is no way to discuss this with UBNT team leaded by UBNT-Chuck there. He constantly reject and close any discussion about CRC errors generated by AF units.
Well I just Skyped Matt Hardy over at UBNT with a link to this thread and asked him to run it up the UBNT flagpole and see who solutes, we shall see?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
bmv - Member
- Posts: 30
- Joined: Sun Aug 23, 2015 4:07 am
- Location: Dorset/Wiltshire, UK
- Has thanked: 5 times
- Been thanked: 12 times
Re: Dropping ports on new WS, what is wrong with my setup?
I'd like to add that we are getting what potentially could be the same problem on our network, our second busiest POP which runs the following:
Mikrotik (FC auto on all ports) RB1100AHx2 - 6.32.3
Netonix (FC enabled on all ports) WS-24-400A - 1.3.9
AF5X radios (FC enabled) - mixture of 3.2-rc2.28535 and 3.2
We have 4 AF5X radios connected to this switch along with a load of other RocketM5 and RocketTi radios.
We leave FC enabled by default on all ports.
It's a busy POP for us, and we have a LACP link between the Netonix and the Mikrotik for load balancing the PAUSE frames.
The problems we have been having is where we lose all comms over the LACP link, so this includes management to the switch, PPP downstream customers.
However we don't lose traffic for some ports we have configured up as mid-span injectors for some of our P2P links that have a native VLAN on them and basically provide POE to the radio and a link to a dedicated ETH port on the router.
One of these P2P links is an AF5X link, the others are RocketTi.
Traffic continues on these links when we have the problem, OSPF doesn't drop, but the LACP links dies and our PPP customers and management access to the switch dies as this routes via the LACP link.
What we have discovered is this problem mainly affects our LACP link, as we have configured a 'backdoor' for management access on another port between the switch and the router. When we move the management VLAN on our router to this backdoor port, we regain access to the switch. However disabling/enabling the LACP ports does not fix this. Only a reboot via the GUI or in person brings the LACP link back up. We have disabled ports on the Mikrotik/Netonix side without success. A reboot of the Mikrotik doesn't fix this either. ONLY a reboot of the Netonix fixes this.
We Layer3 route our network, however this site is very busy access POP, so has a big L2 segment to it.
We have planned to split this up, but it's taken this event to do this thinking it was related to that. However I doubt it now from reading this and some of our results following segregation.
We have stated to divide the network by half at this site, so we have another Mikrotik and another WS-12-250-AC.
The plan is to 50/50 the network.
We are half way there such that the native VLAN is separate.
However there was still a patch between the two Netonix switches running the management VLAN. That was all. No other traffic. FC was enabled on the port though.
We had the issue again for 5 minutes, but it sorted itself out automatically unlike last week's issue which required a physical on-site power cycle.
We lost LACP on both switches to each Mikrotik at the same time!
So the problem manifested from one switch to the other.
Things we do need to check:
-Earthing/grounding
-Config is not corrupted - factory reset and manually do the config again
-Swap out the 24-port switch
We have many tens of other sites with Netonix switches, about a dozen with LACP configured, and about half a dozen with AF5X.
At all the sites we get the CRC errors!
However we don't get this complete melt down.
Each of the other sites is considerably less complicated. A much smaller L2 domain. Less customers. Lower traffic amounts.
I can't remember when this started, but we have been having STP issues such that we know there are no loops, but STP kicks in and takes ports down.
This may be related, but we have since disabled STP and still have these problems.
Anyway, this is my input into this....
Mikrotik (FC auto on all ports) RB1100AHx2 - 6.32.3
Netonix (FC enabled on all ports) WS-24-400A - 1.3.9
AF5X radios (FC enabled) - mixture of 3.2-rc2.28535 and 3.2
We have 4 AF5X radios connected to this switch along with a load of other RocketM5 and RocketTi radios.
We leave FC enabled by default on all ports.
It's a busy POP for us, and we have a LACP link between the Netonix and the Mikrotik for load balancing the PAUSE frames.
The problems we have been having is where we lose all comms over the LACP link, so this includes management to the switch, PPP downstream customers.
However we don't lose traffic for some ports we have configured up as mid-span injectors for some of our P2P links that have a native VLAN on them and basically provide POE to the radio and a link to a dedicated ETH port on the router.
One of these P2P links is an AF5X link, the others are RocketTi.
Traffic continues on these links when we have the problem, OSPF doesn't drop, but the LACP links dies and our PPP customers and management access to the switch dies as this routes via the LACP link.
What we have discovered is this problem mainly affects our LACP link, as we have configured a 'backdoor' for management access on another port between the switch and the router. When we move the management VLAN on our router to this backdoor port, we regain access to the switch. However disabling/enabling the LACP ports does not fix this. Only a reboot via the GUI or in person brings the LACP link back up. We have disabled ports on the Mikrotik/Netonix side without success. A reboot of the Mikrotik doesn't fix this either. ONLY a reboot of the Netonix fixes this.
We Layer3 route our network, however this site is very busy access POP, so has a big L2 segment to it.
We have planned to split this up, but it's taken this event to do this thinking it was related to that. However I doubt it now from reading this and some of our results following segregation.
We have stated to divide the network by half at this site, so we have another Mikrotik and another WS-12-250-AC.
The plan is to 50/50 the network.
We are half way there such that the native VLAN is separate.
However there was still a patch between the two Netonix switches running the management VLAN. That was all. No other traffic. FC was enabled on the port though.
We had the issue again for 5 minutes, but it sorted itself out automatically unlike last week's issue which required a physical on-site power cycle.
We lost LACP on both switches to each Mikrotik at the same time!
So the problem manifested from one switch to the other.
Things we do need to check:
-Earthing/grounding
-Config is not corrupted - factory reset and manually do the config again
-Swap out the 24-port switch
We have many tens of other sites with Netonix switches, about a dozen with LACP configured, and about half a dozen with AF5X.
At all the sites we get the CRC errors!
However we don't get this complete melt down.
Each of the other sites is considerably less complicated. A much smaller L2 domain. Less customers. Lower traffic amounts.
I can't remember when this started, but we have been having STP issues such that we know there are no loops, but STP kicks in and takes ports down.
This may be related, but we have since disabled STP and still have these problems.
Anyway, this is my input into this....
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
bmv wrote:I'd like to add that we are getting what potentially could be the same problem on our network, our second busiest POP which runs the following:
Mikrotik RB1100AHx2 - 6.32.3
Netonix WS-24-400A - 1.3.9
We have 4 AF5X radios connected to this switch along with a load of other RocketM5 and RocketTi radios.
We leave FC enabled by default on all ports.
It's a busy POP for us, and we have a LACP link between the Netonix and the Mikrotik for load balancing the PAUSE frames.
Well this is what I would do if I experienced anything similar:
1) Upgrade to v1.4.0rc16 not because it fixes this issue (we think it may be a UBNT issue) I just like it better and it does fix some other issues.
I am running v1.4.0rc16 on my WISP towers, it is fine. In fact I think it is our best firmware yet. Just leave Discovery Disabled on the Device/Configuration Tab.
2) DISABLE Flow Control on any port going to an AF radio.
3) Leave Flow Control ENABLED for all other ports not connected to AF radios.
Try these suggestions and see if your problems go away.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Who is online
Users browsing this forum: No registered users and 2 guests