Dropping ports on new WS, what is wrong with my setup?

DOWNLOAD THE LATEST FIRMWARE HERE
User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 12:19 pm

tma wrote:
sirhc wrote:We are open to other suggestions on this issue but the common thing seems to be AF radios with Flow Control and the packets appear to be coming from the AF radios to the switch in very large numbers?


The reason I'm questioning this explanation is that I have seen two Netonix switches connected by a cable - nothing else connected and no AF in between - sending this 8 Mbps stream from one switch to the other (one way).


I was under the impression that this was something you saw months ago and was not able to replicate?

That what you saw was during your LABs with LAGs which could have been a loop as you did discover a bug in the way we were handing LAGs.

Are you now saying that when this event occurred which I think you said you could not replicate since you were sure they were pause frames and not broadcast packets?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 12:30 pm

Then there is this post from mayheart: viewtopic.php?f=17&t=1654&start=90#p12540

mayheart wrote:Configuration wise, I run a complete layer 3 network. No possible vlan loops can happen. It's not a spanning-tree problem.

Example of config on a switch facing an airfiber

Code: Select all
 switchport trunk encapsulation dot1q
switchport trunk allowed vlan 13,975,976,992
switchport mode trunk
flowcontrol receive desired
hold-queue 512 in
hold-queue 512 out



The bogus traffic coming in on the airfiber port happens with flow-control enabled and high peak time. I can't reproduce this problem with Rocket M5s, SAF Luminas and Ceragon microwaves units with the same switch config.

Total capacity of this link is 357/140mbps. I'm doing at most 170/40 at night, so it's not a problem with congestion.

The UBNT guys did see a lot of pause frames on the AF radios its self. I have the input/output queue increased on my switches to help with flows. Interface statistics on the switch shows dropping no packets from buffer starvation.

I've been back and forth with UBNT's engineers on this problem, still no solution.

Shutting down the port and bringing it back up or a power cycle of the AF unit will stop the bogus RX traffic and restore the normal flow.


mayheart wrote:I have a similar problem with Cisco and Airfiber units.

Having flow-control and RSTP will cause the interface to randomly lockup, shutting it down and bringing it back up is the only way to get it to work again. The interface will report 0 TX packets and lots of RX packets. Disabling flow-control & spanning-tree fixed the problem but that's not a good solution.

Perhaps the Airfiber units are sending some funky traffic when a certain situation occurs?

It's a shot in the dark but the situation looks similar. I just use Netonix for access points & PoE switches up towers when height is an issue.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 12:56 pm

sirhc wrote:I was under the impression that this was something you saw months ago and was not able to replicate? That what you saw was during your LABs with LAGs which could have been a loop as you did discover a bug in the way we were handing LAGs. Are you now saying that when this event occurred which I think you said you could not replicate since you were sure they were pause frames and not broadcast packets?


It happened 4 times "out of the blue" and I wasn't able to reproduce it at will, correct. OTOH, nobody else here has been able to trigger the problem at will or we wouldn't speculate anymore ;-). The 1st and 2nd time a LAG was configured on the switch and so I thought this was due to a loop forming within the LAG. That was wrong because the third occurrence was with one cable between two Netonix in default configuration. The 4th case was not reported here because I had to quickly disconnect everything in order not to disturb our LAN and so wasn't able to see the details again.

The problem that was fixed was a different one. It was easily reproducible and required STP in the mix. It was a real broadcast storm and generated a much higher data rate than just 8 Mbps.
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:09 pm

I do not know Thomas?

But I do know the following:
mayheart claims he can only replicate this with AF radios with not only Netonix but Cisco switches as well.
This problem started when UBNT fixed FC on AF firmware.
People are only reporting this with AF radios using FC multiple flat segment hops out.
People are running multiple hops out with other radio brands and non AF UBNT radios and are also not seeing this.
Disabling FC on the AF radios prevents this from happening.

If you can give me a way to replicate this with 2 Netonix and a straight cable we will dive right in there.

We are not giving up, nor have we said this is definitely a UBNT issue and we are not at fault but we need more information. However with that said the circumstantial evidence does tend to lean towards a UBNT issue?


mayheart wrote:Example of Cisco config on a switch facing an airfiber

Code: Select all
 switchport trunk encapsulation dot1q
switchport trunk allowed vlan 13,975,976,992
switchport mode trunk
flowcontrol receive desired
hold-queue 512 in
hold-queue 512 out



The bogus traffic coming in on the airfiber port happens with flow-control enabled and high peak time. I can't reproduce this problem with Rocket M5s, SAF Luminas and Ceragon microwaves units with the same switch config.

Shutting down the port and bringing it back up or a power cycle of the AF unit will stop the bogus RX traffic and restore the normal flow.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
bmv
Member
 
Posts: 30
Joined: Sun Aug 23, 2015 4:07 am
Location: Dorset/Wiltshire, UK
Has thanked: 5 times
Been thanked: 12 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:09 pm

FWIW since we have disabled the LAG between our WS-24-400A and MK 1100AHx2 we have not had a repeat issue. This was changed over a week ago. Before that we had a failure within 4 days of the last one and probably about 2 weeks and 1 week prior.

We have also started to split our large flat network at this site in two.
We are changing so much stuff it's hard to know what's going to fix this.

For us, we have been running 3.2 AF5X firmware for ages, and have this rolled out at different locations without issues.
It's just this one site which is busy.

The only difference for us is a that this site where we have the problems has 4 AF5X radios of which 3 are in the same L2 broadcast domain and this L2 broadcast domain is large!!

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:17 pm

sirhc wrote:Supposedly there is a bug being fixed in the AF firmware now that is responsible for the excessive CRC errors maybe they are related?


These CRC errors reported by IntL-Daniel and confirmed by UBNT happen only if in-band-management is enabled and if one AF pings the other through the bridge from the CPU to the AF data channel (which is what in-band-management is all about), if the watchdog ping is enabled between AFs or if the CLI is used to ping manually. These CRC errors do not occur on the data channel "by themselves" - unless they come from bad cable and the other typical sources that cause CRC errors. So these CRCs aren't excessive in any way. Think of them as 1 per second. That should not be related to FC activity unless it triggers something else which in turn starts an FC flood.
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:41 pm

tma wrote:
sirhc wrote:Supposedly there is a bug being fixed in the AF firmware now that is responsible for the excessive CRC errors maybe they are related?


These CRC errors reported by IntL-Daniel and confirmed by UBNT happen only if in-band-management is enabled and if one AF pings the other through the bridge from the CPU to the AF data channel (which is what in-band-management is all about), if the watchdog ping is enabled between AFs or if the CLI is used to ping manually. These CRC errors do not occur on the data channel "by themselves" - unless they come from bad cable and the other typical sources that cause CRC errors. So these CRCs aren't excessive in any way. Think of them as 1 per second. That should not be related to FC activity unless it triggers something else which in turn starts an FC flood.


Well there is a bug, who knows how 1 bug relates to another bug, grasping at straws here Thomas. But I have complained that the AF24 radios generate CRC errors on the Ethernet port of the switch during rain events.

I can have an AF24 link that for a week of clear weather reports no CRC Errors on the switch port but then a rain event rolls through where the wireless modulates down or even drops out and I then see CRC errors on the switch Ethernet port facing the AF24? Why is this, the rain is surely not affecting the Ethernet communications. We have gone as far are ran different cables and ends and switched ports in the switch and router and these mystery CRC errors still occur.

Also I have an AF24 link near a competitors Dragonwave 24GHz link (within 30 feet on water tank railing) and when they play with their channels they cause CRC errors on the switch port facing the AF24 radio....why? Even if they mess with the wireless packets these packets should fail the CRC check on the far side radio before they are passed to the Ethernet PHY or you would think anyway????

I am not sure what to say here, we do not "think" this is us but we are asking for help or proof that we are doing something wrong that we need to fix.

So far everything that has been found points to not being us?

At this point we just want something we can fix or get UBNT to fix something they can fix.

At least we are acknowledging that there is a problem "somewhere", and we are providing a lot more interaction attempting to understand the issue at this point.

And we are proving a work around by Disabling FC on the AF radio.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:43 pm

bmv wrote:For us, we have been running 3.2 AF5X firmware for ages, and have this rolled out at different locations without issues.
It's just this one site which is busy. The only difference for us is a that this site where we have the problems has 4 AF5X radios of which 3 are in the same L2 broadcast domain and this L2 broadcast domain is large!!


That's why I think it is not that easy that every AF could suddenly go into FC flood mode - or you'd see it everywhere. There must be something that triggers this behavior ... so what if the AF only reacts with FC frames to traffic that it receives - something that doesn't show in the graph? Nobody has really seen what happens on the wire between the switch and the AF.

Taking into account that it needs a larger L2 broadcast domain, what if the AF just relays FC frames through that were generated on the remote side? (That is a question I have just asked on the UBNT forum.)

It might be helpful to find more common denominators such as: Are these AF devices being used with in-band-management? Has the multicast filter been enabled on the AF?
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:46 pm

My guess is MOST people use IN-BAND management, very few run an extra cable.

This brought up another post I made about the MAC address reported by the AF when using IN-BAND Management. Check it out.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: Dropping ports on new WS, what is wrong with my setup?

Mon May 09, 2016 1:54 pm

Chris, I fully agree on *other* types of CRC errors. I just wanted to say that the problem confirmed by UBNT is very specialized. Other than that, AF has a long history for CRC problems and packet loss (unconfirmed by UBNT). We had to RMA units from the first production runs and sometimes got back such units which showed it again ... as if RMA stands for circulating around until a customer is found who wouldn't notice ;-)
--
Thomas Giger

PreviousNext
Return to Hardware and software issues

Who is online

Users browsing this forum: Google [Bot] and 17 guests