Dropping ports on new WS, what is wrong with my setup?

DOWNLOAD THE LATEST FIRMWARE HERE
User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Dropping ports on new WS, what is wrong with my setup?

Tue May 03, 2016 10:09 pm

mayheart wrote:I have a similar problem with Cisco and Airfiber units.

Having flow-control and RSTP will cause the interface to randomly lockup, shutting it down and bringing it back up is the only way to get it to work again. The interface will report 0 TX packets and lots of RX packets. Disabling flow-control & spanning-tree fixed the problem but that's not a good solution.

Perhaps the Airfiber units are sending some funky traffic when a certain situation occurs?

It's a shot in the dark but the situation looks similar. I just use Netonix for access points & PoE switches up towers when height is an issue.


Well this has been tossed around internally, we are not sure if this is us or something else "possibly" AF radios in certain configurations.

Thanks you for your input, this is the type of information we need to nail this down. Please provide as much detail as you can about your situation.

It does appear that most reported cases involve an airFIBER radio but not certain that all cases have been AF radios and then what the exact configurations were.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
nickwhite
Member
 
Posts: 45
Joined: Fri Jul 03, 2015 10:17 am
Location: Austin, Texas
Has thanked: 9 times
Been thanked: 2 times

Re: Dropping ports on new WS, what is wrong with my setup?

Tue May 03, 2016 11:12 pm

So far I'm 6 hours without a crash. This might be the longest since Friday night.

A couple weeks ago when this happened on our #7 POP (the same one that randomly crashed this afternoon), one of the POPs feeding to it has a ToughSwitch, which was running RTSP, and I did observe the 15,000 PPS on that interface briefly while troubleshooting and rebooting things. It is an AF5 link. I have never seen it on any other ToughSwitch. This issue has always happened with an AirFiber, and is always observed on a Netonix port. In case it matters, I think I first observed this around the time the 3.2 betas came out for AirFibers.

On a side note: we installed a solar POP with an AC AP and AF5x PTP for a new construction site. The feed from the customer's NOC comes in on the AF5x, to a Netonix, and then from the Netonix is an AC PTP and AC AP. This was completely separate from our network until about a week ago - we installed a backhaul in so that we could monitor it. They have had "lock ups" about every 4-6 days for the least 3 weeks. Sometimes it's been once per day. I think this may be the same issue affecting that. I don't know if they're running RTSP on those links, but I will find out.

User avatar
nickwhite
Member
 
Posts: 45
Joined: Fri Jul 03, 2015 10:17 am
Location: Austin, Texas
Has thanked: 9 times
Been thanked: 2 times

Re: Dropping ports on new WS, what is wrong with my setup?

Tue May 03, 2016 11:28 pm

sirhc wrote:We are not even sure this is us, could be another device interacting with us poorly.

I'd put my money on this... wouldn't be the first time UBNT didn't play nice with other gear. I think Flow Control wasn't enabled on the AF's until version 3.2. Not sure if it was there in 2.X - I don't think it was.

sirhc wrote:What we would really like to know is what that mystery traffic is but port mirror will not work as certain types of packets such as BPDU and pause frames are not mirrored. And who is generating these packets?

I tried a laptop with Wireshark yesterday... as soon as my tech plugged in his laptop the strange traffic showed up on the port he was plugged into. The really strange thing was that after he unplugged his laptop, the port showed down, but was still sending about 140Kbps of traffic according to the graphs, until I rebooted the switch and cleared everything.

Could this be caught by running tcpdump or similar on the Netonix?

User avatar
yahel
Member
 
Posts: 54
Joined: Wed May 27, 2015 12:07 am
Location: Berkeley, CA
Has thanked: 14 times
Been thanked: 11 times

Re: Dropping ports on new WS, what is wrong with my setup?

Tue May 03, 2016 11:53 pm

I just came across this thread today, and yet to take a deep dive into what's being reported...
That said, I must contribute that we've also seen lots of crazy behavior that I believe is related to RSTP.
Specifically, ports that stop forwarding.

We've recently disabled all storm control options on all switches and I believe things have been stable since.
Perhaps unrelated, as many other things have changed during the same time frame.

We do have Cisco and Ubiquiti discovery enabled.

Hope this helps and does not confuse further.

Good luck tracking this down!

Yahel.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: Dropping ports on new WS, what is wrong with my setup?

Tue May 03, 2016 11:59 pm

I have reported about a VERY SIMILAR problem in this post:
http://forum.netonix.com/viewtopic.php?f=17&t=1022#p7949

If it is the same problem - and by the characteristics of the 8 Mbps / 15 Kbps stream and other details reported here I think it is - you should consider

** downgrading firmware will not help, it was present on 1.3.2 and maybe even earlier
** it just takes two Netonix switches and a cable between them, nothing else
** traffic level is not the trigger, it happened on idle switches

However, with that simplistic lab setup, I was never able to reproduce the problem, so I stopped reporting about it - also because Chris kept telling me I'm doing something wrong with RSTP and LAGs, although the lab setup didn't involve either, so I wanted to come up with a super easy recipe to prove it. FWIW, the switches in the field that showed the problem of 8 Mbps bogus traffic taking the switch down (one of them) never had it again since. And the two where I've seen it first still run 1.3.2 fine.

In my network, the trigger seems to be missing - it happened only when we connected pairs of Netonix switches for the first time and never occurred since. However, the cases reported here seem to have a trigger that is similar to when you connect two Netonixes with a cable. It might be the link state changing (due to STP activities) or flow control stalling packets or both when interacting. I avoid STP as much as I can. Having an OSPF based routed network, that is always doable, and that may have saved me so far.

However, when the trigger pulls off, another unrelated (active) port might get into this state and inject the 8 Mbps stream all by itself, until you unplug that port, forcing it to be down so it can't inject packets anymore. Eric, if I may suggest something: Have a look at conditions that might cause a port to send out packets it has just received on that very same port (or the internal mechanisms that prevent such unwanted behavior) ... specifically look for broadcast packets like ARPs doing that. I guess you agree that even broadcasts should never go back to the port they were received on, although ... that should be a function of the silicon, shouldn't it?
--
Thomas Giger

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: Dropping ports on new WS, what is wrong with my setup?

Wed May 04, 2016 12:13 am

Actually, reading my older post (viewtopic.php?f=17&t=1022#p7949) again, I found I had observed and reported this:

"Next, I wanted to see the effect via management on switch 2, and so I connected myself to port 5 on switch 2. But when I did that, the ping-ponging stopped immediately and I'm now able to ping both switches again."

That would indicate: Any change of link state on any (unrelated) port may cause the misbehaving ports to stop their ping-ponging. That was observed in this thread too. I just want to emphasize that this (unrelated) port is not part of the problem - it is not part of a loop and actually there's no loop whatsoever - it is a ping-ponging going on and when some port changes link state, the switch seems to reconsider what it is doing on all ports, thereby fixing the ping-ponging.
--
Thomas Giger

User avatar
nickwhite
Member
 
Posts: 45
Joined: Fri Jul 03, 2015 10:17 am
Location: Austin, Texas
Has thanked: 9 times
Been thanked: 2 times

Re: Dropping ports on new WS, what is wrong with my setup?

Wed May 04, 2016 9:19 am

tma wrote:Actually, reading my older post (http://forum.netonix.com/viewtopic.php? ... 1022#p7949) again, I found I had observed and reported this:

"Next, I wanted to see the effect via management on switch 2, and so I connected myself to port 5 on switch 2. But when I did that, the ping-ponging stopped immediately and I'm now able to ping both switches again."

That would indicate: Any change of link state on any (unrelated) port may cause the misbehaving ports to stop their ping-ponging. That was observed in this thread too. I just want to emphasize that this (unrelated) port is not part of the problem - it is not part of a loop and actually there's no loop whatsoever - it is a ping-ponging going on and when some port changes link state, the switch seems to reconsider what it is doing on all ports, thereby fixing the ping-ponging.


Do you know if you had Flow Control enabled or not?


I'm over 18 hours now without the crashes since turning Flow Control off yesterday afternoon.

User avatar
mayheart
Experienced Member
 
Posts: 166
Joined: Thu Jan 15, 2015 1:42 pm
Location: Canada
Has thanked: 43 times
Been thanked: 40 times

Re: Dropping ports on new WS, what is wrong with my setup?

Wed May 04, 2016 10:50 am

Configuration wise, I run a complete layer 3 network. No possible vlan loops can happen. It's not a spanning-tree problem.

Example of config on a switch facing an airfiber

Code: Select all
 switchport trunk encapsulation dot1q
switchport trunk allowed vlan 13,975,976,992
 switchport mode trunk
 flowcontrol receive desired
hold-queue 512 in
 hold-queue 512 out



The bogus traffic coming in on the airfiber port happens with flow-control enabled and high peak time. I can't reproduce this problem with Rocket M5s, SAF Luminas and Ceragon microwaves units with the same switch config.

Total capacity of this link is 357/140mbps. I'm doing at most 170/40 at night, so it's not a problem with congestion.

The UBNT guys did see a lot of pause frames on the AF radios its self. I have the input/output queue increased on my switches to help with flows. Interface statistics on the switch shows dropping no packets from buffer starvation.

I've been back and forth with UBNT's engineers on this problem, still no solution.

Shutting down the port and bringing it back up or a power cycle of the AF unit will stop the bogus RX traffic and restore the normal flow.

sirhc wrote:
mayheart wrote:I have a similar problem with Cisco and Airfiber units.

Having flow-control and RSTP will cause the interface to randomly lockup, shutting it down and bringing it back up is the only way to get it to work again. The interface will report 0 TX packets and lots of RX packets. Disabling flow-control & spanning-tree fixed the problem but that's not a good solution.

Perhaps the Airfiber units are sending some funky traffic when a certain situation occurs?

It's a shot in the dark but the situation looks similar. I just use Netonix for access points & PoE switches up towers when height is an issue.


Well this has been tossed around internally, we are not sure if this is us or something else "possibly" AF radios in certain configurations.

Thanks you for your input, this is the type of information we need to nail this down. Please provide as much detail as you can about your situation.

It does appear that most reported cases involve an airFIBER radio but not certain that all cases have been AF radios and then what the exact configurations were.

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Update on this issue

Wed May 04, 2016 1:17 pm

At this point we are still investigating this issue but it is looking more and more like this "may" "possibly" be a UBNT firmware issue?

It "appears" that for whatever reason the airFIBER radios start sending THOUSANDS of pause frames per second at the switch port connected to the airFIBER. It does not lock up the switch but instead it basically leaves the switch inaccessible from that port and what ever port was sending the data stream to that port as they are now paused indefinitely.

At this point I think we need to start looking at what firmware versions people are running on their AF radios that are experiencing the issue. I am not having this issue on any of my switches/airFIBER links.

I am running the following firmware:
airFIBER 24 radios firmware v3.2
AF5X radios firmware v3.2-rc2.28535

We are looking at putting in a safety mechanism whereas the switch CPU will monitor each port and if it detects this Pause Frame Storm it will disable Flow Control on that port and then make a log entry as to why it did so.

When we turn OFF Flow Control for that port it momentarily will break Ethernet communications which is only a second or 2 but if you are running higher level routing protocols over that port such as OSPF it will cause the adjacency to go down which could cause an outage until the adjacency recovers and OSPF converges. Flow Control will then remain OFF so it should only happen once and you do not have to go out and reboot the switch or as tma (Thomas) pointed out simply unplugging the cable and plugging it back in would also fix the issue but this reboots the airFIBER radio.

SO MY ADVICE IS AS FOLLOWS:
Report what AF firmware versions you are running if you experience this issue.

If you encounter this problem simply turn Flow Control OFF on the port connected to the airFIBER radios and the problem should go away.

If turning Flow Control OFF solves your issue report that it did.

We will continue to investigate this, we are not blaming UBNT at this time but we are suspect that this "may" be a firmware issue for AF and not us.

Keep in mind since we started reporting if Flow Control was really active on the Status Tab of our switches a few months ago then UBNT and other Vendors like MIMOSA are going back and enable Flow Control on many of their devices such as Radios and Routers as it was not really active.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
WisTech
Associate
Associate
 
Posts: 213
Joined: Mon Aug 04, 2014 3:57 pm
Has thanked: 8 times
Been thanked: 64 times

Re: Dropping ports on new WS, what is wrong with my setup?

Wed May 04, 2016 1:33 pm

Chris, I'm running a new developmental build for NxN improvements on the 5X. I'll disable FC and see what happens.

EDIT - It has to be something screwy with the other end, I disabled FC and still can't get to one of my units, or the remote units, but in the MAC table on one of the ports shows the IP for the remote 6 mini.

PreviousNext
Return to Hardware and software issues

Who is online

Users browsing this forum: Google [Bot] and 17 guests