I wanted to pull this out of the 1.4.9 thread since my issue likely has little to do with the version I'm running.
I just wanted to go over what I'm understanding about how the flow control/flood problem with ubiquiti gear manifests and why to make sure i've got this right and then just go over the config I was seeing the issue with.
So:
- Wifi is not like wires, link speed changes, packets drop and the connection is otherwise unstable in many real-world situations
- Some favor using flow control, some don't and there are arguments for and against depending on how your network is setup; point being that variable link speed means that if you use FC, it WILL fire often
- Ubiquti's implementation tends to have bugs, and these bugs can cause a UBNT to send a flood of FC "pause" packets to the connected switch
- If that connected switch has FC enabled, the flood of pause packets will essentially shut the switch port down
- If that port is how you reach your switch for management, you'll lose management access and remote logging
- If that connected device has a shared buffer for all ports, it's possible to "lock up" all ports on the switch
- If the Ubiquiti device stops sending a flood of pause packets, the switch will recover when the buffers empty
General Questions:
- Can anyone describe in more detail how the Ubiquiti bug manifests and why the hell it is sending up to millions of PPS in pause frames? (saw that number mentioned in the forums). Is that not obviously excessive and something that they could cap as being obviously out of bounds?
- Is there a current list of known devices/firmware versions of UBNT gear that has this issue?
- In the Netonix switches there is a global setting to enable pause frame storm control. With this turned on (it appears to default on), why can this pause frame UBNT bug still lock the ports up?
- Is it true that just one device sending a flood can lock all ports up by filling the shared buffer and if so, do any of the larger switches have more than one buffer?
- With the Netonix switches being used almost exclusively by WISPs, why not default Flow Control to off and allow the more adventurous/knowledgable to enable it if they want to risk it?
- What is the current case where enabling FC with Ubiquiti gear involved actually works?
And finally, a question on my own issue... I was simply not seeing the switch recover on its own.
At the site where I experience the "lockup", I have a UBNT powerbeam being fed from a nearby sector with three VLANs bridged across from a cisco L3 switch connected to the sector (UBNT ac), this is the site backhaul. I then have two customers in the building that terminate on the switch and both are coming in on their own tagged VLAN, which is then untagged at each customer's premise. I also have a small sector at this site (UBNT ac) which in turn has 2 customers connected via two more PowerBeams. Flow control was at default (on) on all ports. When I would lose contact with the switch, even if I turned down the ethernet port on the PowerBeam feeding the switch, and turned off flow control on the Powerbeam and let it sit there for a bit, not even a single ping from the switch on re-enabling it. Power cycle was the only option we had for remote access, and that would "unlock" the switch ports for minutes or hours. Replacing the switch with a new one did not fix anything. About 16 hours ago, I disabled flow control on all ports on the switch and so far have not seen it go unreachable yet.
Three recent changes to this site:
- Upgrade to 1.4.9. (downgraded to 1.4.8 today, just in case)
- Added a Metrolinq, which was running the first time this happened (and the PowerBeam was disabled as I did not want screw with RSTP) but was subequently disabled for troubleshooting
- Upgraded all UBNT to 8.4.3, from 8.3.mumble
All of those happened at least a week before the problem surfaced.
Anyone want to take a stab at why the problem just started this week, well after any of the above changes? Or why other sites that have a very similar config with identical equipment and firmware are not seeing this problem?
Flow Control / Packet Locks overview
-
mike99 - Associate
- Posts: 837
- Joined: Tue Nov 25, 2014 10:53 am
- Location: Quebec, Canada
- Has thanked: 95 times
- Been thanked: 245 times
Re: Flow Control / Packet Locks overview
viewtopic.php?f=17&t=1823#p19239
The most important thing about flow control is that it must be a radio by router port and never use it on a flat layer 2 network where several device see each other. Also, set your router and switch to obey, not both. The only port that should have both or generate is the router so you can pass pause frame to the router.
A schema your help for your problem. Yed or draw.io can be use for free to make those.
The most important thing about flow control is that it must be a radio by router port and never use it on a flat layer 2 network where several device see each other. Also, set your router and switch to obey, not both. The only port that should have both or generate is the router so you can pass pause frame to the router.
A schema your help for your problem. Yed or draw.io can be use for free to make those.
-
sporkman - Member
- Posts: 86
- Joined: Mon Jul 27, 2015 7:03 pm
- Location: New York, NY
- Has thanked: 8 times
- Been thanked: 11 times
Re: Flow Control / Packet Locks overview
Any description of how UBNT handles FC differently than other devices that don't have this issue? Is it a bug or a feature that feels like a bug?
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Flow Control / Packet Locks overview
sporkman wrote:Any description of how UBNT handles FC differently than other devices that don't have this issue? Is it a bug or a feature that feels like a bug?
There is a complete thread on this:
viewtopic.php?f=17&t=671&hilit=Flow+Control+UBNT#p4840
READ THE WHOLE THREAD AS THINGS CHANGE THROUGH THE COURSE OF THE THREAD
It is a long read but this is definitely a BUG on UBNT's side but also the fact that Flow Control was never designed with wireless networks in mind where capacity does not match physical link state and the capacity varies rapidly based on external conditions such as but not limited to weather and interference.
viewtopic.php?f=17&t=671&p=22310&hilit=+Flow+Control#p22310
mayheart wrote:Sorry to be the bearer of bad news, latest 8.5 build did not fix the problem.
UBNT developer saying now 8.5.1 will have it fixed once and for all.... (Looking at early February for the release)
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
mike99 - Associate
- Posts: 837
- Joined: Tue Nov 25, 2014 10:53 am
- Location: Quebec, Canada
- Has thanked: 95 times
- Been thanked: 245 times
Re: Flow Control / Packet Locks overview
Flow control have a buggy design where the only utility is to pass over QoS the next device for network device slower than their ethernet port (wireless, MoCA, EoP, etc.) with crappy or no QoS. Even with a good router with good QoS, flow control is still introducing problem for stuff really sensible to latency like fax.
Per priority FC is way better since the pause frame only affect one of the 8 queue of the whole NIC but I only saw this feature on 10 Gb/s device.
Per priority FC is way better since the pause frame only affect one of the 8 queue of the whole NIC but I only saw this feature on 10 Gb/s device.
-
sporkman - Member
- Posts: 86
- Joined: Mon Jul 27, 2015 7:03 pm
- Location: New York, NY
- Has thanked: 8 times
- Been thanked: 11 times
Re: Flow Control / Packet Locks overview
I guess the point is moot since it's been 5+ days with no issues since I turned off FC on the problematic switch. Of course it's still enabled in a few other places with no issue, also same UBNT gear on same version so... ¯\_(ツ)_/¯ I need to look at the counters on those.
I still say default this to OFF on the switches and let people that care go through the debate as to whether it's appropriate for their site. This should be a blood pressure lowering decison for SirHC because it will reduce support posts. :)
I still don't totally get how this took the switch down for hours on end - ISO backhauling, and then an ac AP with 3 customers on it. I have to assume one of them was just blasting some weird traffic that didn't care whether it got ack'd on the remote end, or perhaps part of the UBNT bug is that it just blasts out pause frames for no good reason. I'm loathe to dig through their forum.
I still say default this to OFF on the switches and let people that care go through the debate as to whether it's appropriate for their site. This should be a blood pressure lowering decison for SirHC because it will reduce support posts. :)
I still don't totally get how this took the switch down for hours on end - ISO backhauling, and then an ac AP with 3 customers on it. I have to assume one of them was just blasting some weird traffic that didn't care whether it got ack'd on the remote end, or perhaps part of the UBNT bug is that it just blasts out pause frames for no good reason. I'm loathe to dig through their forum.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Flow Control / Packet Locks overview
Use the search feature and read some posts
FC works in some places and not others is because of topology and equipment and load and how good the links are and how much traffic is slamming into the switch when Pause Frames occur
viewtopic.php?f=17&t=889&p=21999&hilit=+flow+control+buffers+lock#p21999
viewtopic.php?f=17&t=889&p=21965&hilit=+flow+control+buffers+lock#p21965
viewtopic.php?f=17&t=889&p=21402&hilit=+flow+control+buffers+lock#p21402
viewtopic.php?f=17&t=3003&p=20352&hilit=+flow+control+buffers+lock#p20352
FC works in some places and not others is because of topology and equipment and load and how good the links are and how much traffic is slamming into the switch when Pause Frames occur
viewtopic.php?f=17&t=889&p=21999&hilit=+flow+control+buffers+lock#p21999
viewtopic.php?f=17&t=889&p=21965&hilit=+flow+control+buffers+lock#p21965
viewtopic.php?f=17&t=889&p=21402&hilit=+flow+control+buffers+lock#p21402
viewtopic.php?f=17&t=3003&p=20352&hilit=+flow+control+buffers+lock#p20352
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sporkman - Member
- Posts: 86
- Joined: Mon Jul 27, 2015 7:03 pm
- Location: New York, NY
- Has thanked: 8 times
- Been thanked: 11 times
Re: Flow Control / Packet Locks overview
Yeah, I read that big thread and then searched on "packet lock" and read a bunch. I get the generalities here - hit the switch with a bunch of pause frames and all ports will lock up.- got it. Other switches generally don't default to turning FC on, so the problem is hidden there - got it. There are scenarios where you can enable it and it works - got it. FC is a port-level, not VLAN-level thing, which is why you have your backhaul stuff and APs trunked apart from each other - got it.
UBNT bugginess - still not clear on what it is - is it that they shouldn't be defaulting to FC on? Are they sending pause when they shouldn't? Are they flooding pause packets when they shouldn't? I couldn't track that down in the mega-thread.
Also the "pause frames" under storm control was checked on my switch, but that still left the device unreachable for very long periods of time, so I'm wondering if I should bother with that or not?
I'm just lost on what my scenario was even though I know how it's fixed (turning FC off). Making it more complicated, the other sites have zero pause packets shown on any counters even when FC was enabled. Same setup - fed by a PB-AC-ISO, then a sector with a few high-bandwidth/commercial clients and no problems.
I get your point - something must have been sending traffic. But what kind of weird traffic is someone generating that is going to pummel the link to the point of port lockup when there's not even a path to the internet anymore? While I couldn't access the switch, I could reach the ISO that fed the site. Turning FC off on the UBNT did not "unlock" the ports, nor did turning the port down or isolating it in an empty vlan. So the junk that was locking the port had to be coming from the Rocket AC or one of the 3 clients behind it, and that junk traffic could not be your standard TCP since TCP would stop hitting the client/AP/switch when it failed to reach the internet (no ACKS, no traffic). Make sense? I'm not trying to blame the switch or anything, I just hate unsolved mysteries.
UBNT bugginess - still not clear on what it is - is it that they shouldn't be defaulting to FC on? Are they sending pause when they shouldn't? Are they flooding pause packets when they shouldn't? I couldn't track that down in the mega-thread.
Also the "pause frames" under storm control was checked on my switch, but that still left the device unreachable for very long periods of time, so I'm wondering if I should bother with that or not?
I'm just lost on what my scenario was even though I know how it's fixed (turning FC off). Making it more complicated, the other sites have zero pause packets shown on any counters even when FC was enabled. Same setup - fed by a PB-AC-ISO, then a sector with a few high-bandwidth/commercial clients and no problems.
I get your point - something must have been sending traffic. But what kind of weird traffic is someone generating that is going to pummel the link to the point of port lockup when there's not even a path to the internet anymore? While I couldn't access the switch, I could reach the ISO that fed the site. Turning FC off on the UBNT did not "unlock" the ports, nor did turning the port down or isolating it in an empty vlan. So the junk that was locking the port had to be coming from the Rocket AC or one of the 3 clients behind it, and that junk traffic could not be your standard TCP since TCP would stop hitting the client/AP/switch when it failed to reach the internet (no ACKS, no traffic). Make sense? I'm not trying to blame the switch or anything, I just hate unsolved mysteries.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Flow Control / Packet Locks overview
sporkman wrote:I just hate unsolved mysteries.
A bad client link on the PTMP AP and a client on that AP using say a bittorrent could cause a flood of packets and cause Pause Frames to start?
All sorts of possibilities!!!
Welcome to the wonderful world of IT and WISP - LOL
I used to have a crystal ball for these cases but the Liberals took it thinking it could help with their Russia Russia Russia witch hunt....jokes on them as it only works to find real issues and can not create an issue to find.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sporkman - Member
- Posts: 86
- Joined: Mon Jul 27, 2015 7:03 pm
- Location: New York, NY
- Has thanked: 8 times
- Been thanked: 11 times
Re: Flow Control / Packet Locks overview
Jeez, I thought us NorthEast liberals were the ones that were the snowflakes, where did that deep state complaint come from? :) Is the NSA tapping my MINI? I'm sure our totally honest dear leader will come out clean, because he's all about truth telling (we in NYC especially know that he's not at all mobbed-up and that he's in no way laundering mob money through his money-losing projects).
So my mystery has returned. Switch locks up, doesn't pass traffic. Happened friday afternoon and then about an hour ago tonight. It's on the roof in a box, so no idea what's on the console. Power-cycle brings it back for a bit. What has changed after almost a month of no problems?
- The metrolinq (2.5-60-19) was factory reset and config loaded from backup. Suspecting that it was somehow part of the problem, the port was disabled, but not powered-down after the first lock-up on friday afternoon.
- A Ubiquiti ISO-Station-ac was added to the Mini before that lockup on friday
So tell me about other things that can leave the switch in a state where it won't forward packets but will keep the connected devices powered-up... Here's a few things that make me think this has nothing to do with flow control:
- flow control is disabled on all ports on the mini
- broadcast/flow-control packet storm protection is enabled on the mini
- in the past, shutting down ports on connected devices did not "unlock" the switch
- the biggest change friday was a new device drawing more power, not any network changes
- all ubnt gear is running yet another version claiming to "fix" FC issues
- I've seen no issues with metrolinq and FC discussed on the forums (yet)
I'm leaning towards power. I'm going to have our installer snap a pic of the injector and verify it really is an AirFiber 48VH unit and not some random UBNT thing he found. Sadly, that would not shock me - I assume a normal AirMax PoE injector would work - to a point. Remember, this site was up for years with no issues. I add load (the Metrolinq) and things go crazy. I get a clean month out of it, add a small UBNT ISO-Station and the problem returns. From my perspective, I'm thinking power.
How would the Mini act if it was fed by a normal AirMax PoE injector, what's my max output power?
A screenshot of the Mini shows the Metrolinq pulling 12W (port set to 48V), Powerbeam pulling 5W, Rocket 5AC Prism pulling 6W, and the ISO Station pulling 5W (although the top watts graph show 34.2W). How does that switch core act if it's not getting the juice it should?
So my mystery has returned. Switch locks up, doesn't pass traffic. Happened friday afternoon and then about an hour ago tonight. It's on the roof in a box, so no idea what's on the console. Power-cycle brings it back for a bit. What has changed after almost a month of no problems?
- The metrolinq (2.5-60-19) was factory reset and config loaded from backup. Suspecting that it was somehow part of the problem, the port was disabled, but not powered-down after the first lock-up on friday afternoon.
- A Ubiquiti ISO-Station-ac was added to the Mini before that lockup on friday
So tell me about other things that can leave the switch in a state where it won't forward packets but will keep the connected devices powered-up... Here's a few things that make me think this has nothing to do with flow control:
- flow control is disabled on all ports on the mini
- broadcast/flow-control packet storm protection is enabled on the mini
- in the past, shutting down ports on connected devices did not "unlock" the switch
- the biggest change friday was a new device drawing more power, not any network changes
- all ubnt gear is running yet another version claiming to "fix" FC issues
- I've seen no issues with metrolinq and FC discussed on the forums (yet)
I'm leaning towards power. I'm going to have our installer snap a pic of the injector and verify it really is an AirFiber 48VH unit and not some random UBNT thing he found. Sadly, that would not shock me - I assume a normal AirMax PoE injector would work - to a point. Remember, this site was up for years with no issues. I add load (the Metrolinq) and things go crazy. I get a clean month out of it, add a small UBNT ISO-Station and the problem returns. From my perspective, I'm thinking power.
How would the Mini act if it was fed by a normal AirMax PoE injector, what's my max output power?
A screenshot of the Mini shows the Metrolinq pulling 12W (port set to 48V), Powerbeam pulling 5W, Rocket 5AC Prism pulling 6W, and the ISO Station pulling 5W (although the top watts graph show 34.2W). How does that switch core act if it's not getting the juice it should?
Who is online
Users browsing this forum: No registered users and 58 guests