This boils down to semantics.
Locked up doesn't define something that needs to be rebooted per se. It defines something that was once working or moving that doesn't anymore.
This is what's happening. Call it what you will but the switch is locked up/unresponsive to ANY management unless something reboots it or you can access the radio from the far side and disable flow control on it or whatever.
Dropping ports on new WS, what is wrong with my setup?
-
tma - Experienced Member
- Posts: 122
- Joined: Tue Mar 03, 2015 4:07 pm
- Location: Oberursel, Germany
- Has thanked: 15 times
- Been thanked: 14 times
Re: Dropping ports on new WS, what is wrong with my setup?
sirhc wrote:And we would love to investigate this "other" issue but apparently it is not easily repeatable? ... If someone can provide a LAP that exposes an issue we will jump right on it.
Yes, unfortunately I failed to reproduce it in the lab setup (I had back then). So far, it seems we can only prove to UBNT that they have a FC problem during signal fade, which is somehow different from the issue discussed here because the switches were able to take up to 600 Mbps of FC frames. I hope I can better document the 8 Mbps thing when I see it again - and at the same time I hope it will never hit me on a productive site ;-)
Last edited by tma on Wed May 11, 2016 10:56 am, edited 1 time in total.
--
Thomas Giger
Thomas Giger
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
tma wrote:sirhc wrote:If unplugging the AF radio allows the switch to return to normal that is not the switch being locked up. That is the switch being jammed full of Pause Frames THOUSANDS PER SECOND causing all the ports to be Paused into submission. This is an event that should never happen unless a piece of equipment is malfunctioning.
I doubt that an 8 Mbps FC stream will cause the switch to become inaccessible when it has taken 600 Mbps and one point two million pause frames per second today, remaining accessible all the time and even delivering traffic to the downstream customers. When it becomes inaccessible, something else must be going on than the AF freaking out in the way described.
HUH, WHAT???? No, it is not taking 1.2 MILLION pause frames per second? Maybe total for the day not per second, simply not mathematically possible on a functioning segment. Are you drinking already today Thomas? And you offered me none? - NOT NICE, WAY TO NOT SHARE THOMAS!!
And yes 8 Mbps of nothing but Pause Frames is a crap load of pause frames and not a real world environment. Pause frames are small and only contain a 2 byte integer (0 through 65535) as a payload (very small packet). Also we have no idea how long the Pause Frame is instructing the switch port to pause? The pause time is measured in units of pause or "quanta".
If the switch is receiving TENS OF THOUSANDS of Pause Frames per second this is not a real world event that should ever be seen on a properly running network. This many Pause Frames would definitely mess with the switch and cause HOL or head-of-line-blocking shutting down pretty much most if not all ports until the offending Pause Frames stop and port buffers free up.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
tma - Experienced Member
- Posts: 122
- Joined: Tue Mar 03, 2015 4:07 pm
- Location: Oberursel, Germany
- Has thanked: 15 times
- Been thanked: 14 times
Re: Dropping ports on new WS, what is wrong with my setup?
Sorry, I messed up with the "quote" tags. Of course YOU wrote you would love to investigate. Let's see if I can edit this out.
--
Thomas Giger
Thomas Giger
-
tma - Experienced Member
- Posts: 122
- Joined: Tue Mar 03, 2015 4:07 pm
- Location: Oberursel, Germany
- Has thanked: 15 times
- Been thanked: 14 times
Re: Dropping ports on new WS, what is wrong with my setup?
sirhc wrote:HUH, WHAT???? No, it is not taking 1.2 MILLION pause frames per second? Maybe total for the day not per second, simply not mathematically possible on a functioning segment. Are you drinking already today Thomas? And you offered me none? - NOT NICE!!
Not before 6 pm (usually) ... but please see my screenshots. The AF did send it 1.2 million pps of frames, supposedly FC frames. To one switch it was doing that with 600 Mbps, to the other at 200-550 Mbps. And while it was doing that *real* traffic just came through fine (which is why it remained accessible).
Well, if the graph on the switch was wrong, my conclusions would be wrong too ...
Last edited by tma on Wed May 11, 2016 4:37 pm, edited 1 time in total.
--
Thomas Giger
Thomas Giger
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
adairw wrote:This boils down to semantics.
Locked up doesn't define something that needs to be rebooted per se. It defines something that was once working or moving that doesn't anymore.
This is what's happening. Call it what you will but the switch is locked up/unresponsive to ANY management unless something reboots it or you can access the radio from the far side and disable flow control on it or whatever.
The switch is in-accessible because it is doing as it is supposed to.
It receives a Pause Frame and Pauses the port as instructed. It receives THOUSANDS of Pause Frames per second and again it does as instructed. This pausing causes the switch buffers to be exhausted causing HOL then causes other ports to be paused to the point they are not passing traffic. Remove the source of the Pause Frames and the switch returns to normal.
Flow control is a standard as defined by IEEE
There is nothing defined in the IEEE spec as a mechanism to deal with a malfunctioning piece of equipment that sends THOUSANDS of Pause Frames per second because it should not happen and our switch core follows the IEEE spec.
We have already learned that this event affects other brands of switches the same way including a Cisco switch. When confronted with this never ending stream of Pause Frames the Cisco behaved the same way.
We did put a KLUDGE in rc19 which is NOT in the IEEE spec which will attempt to detect this event and disable FC on the offending port and log it in the log.
I am not sure what you want from us? At this time we "think" this is a bug in the AF firmware which I am sure UBNT is investigating. Either 1 of 2 things will happen:
1: They find it and release a fix case closed.
2: They claim it is not them and then we look at it again and also open a ticket with Cisco as it appears to affect their switches as well.
If this is a stream of Pause Frames from the AF to the switch then the switches are doing what they should according to the IEEE standard, it is not messing up.
I think we have gone above and added a non-standard KLUDGE to v1.4.0rc19 to hopefully prevent WISPs from needing to roll a truck if this happens until we figure it out.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
tma - Experienced Member
- Posts: 122
- Joined: Tue Mar 03, 2015 4:07 pm
- Location: Oberursel, Germany
- Has thanked: 15 times
- Been thanked: 14 times
Re: Dropping ports on new WS, what is wrong with my setup?
EDIT: this is BS of course. The numbers show the rates when the 600 Mbps flood had stopped. So forget this:
[ Hmm. I had a look at my screenshots again. In fact, the switch says that it is doing 3.14/4.60 Mbps on that port, which correlates to the real traffic we had at that time. But the graph shows this 600 Mbps line. Likewise, it says it's doing 639/613 pps but the graph shows 1.24 Mpps. Not sure what to believe unless the graph takes something (FC frames?) into account that the numbers do not.
Note that I posted these screenshots on the UBNT forum as well because I think this is their bug. I'd like Eric to say something about the difference in numbers and the graph - just in case they take me up on that difference between numbers and graph. ]
[ Hmm. I had a look at my screenshots again. In fact, the switch says that it is doing 3.14/4.60 Mbps on that port, which correlates to the real traffic we had at that time. But the graph shows this 600 Mbps line. Likewise, it says it's doing 639/613 pps but the graph shows 1.24 Mpps. Not sure what to believe unless the graph takes something (FC frames?) into account that the numbers do not.
Note that I posted these screenshots on the UBNT forum as well because I think this is their bug. I'd like Eric to say something about the difference in numbers and the graph - just in case they take me up on that difference between numbers and graph. ]
Last edited by tma on Wed May 11, 2016 11:22 am, edited 1 time in total.
--
Thomas Giger
Thomas Giger
-
tma - Experienced Member
- Posts: 122
- Joined: Tue Mar 03, 2015 4:07 pm
- Location: Oberursel, Germany
- Has thanked: 15 times
- Been thanked: 14 times
Re: Dropping ports on new WS, what is wrong with my setup?
sirhc wrote:HUH, WHAT???? No, it is not taking 1.2 MILLION pause frames per second? Maybe total for the day not per second, simply not mathematically possible on a functioning segment.
It seems mathematically possible to me: 1.25 M packets/sec * 64 bytes/packet * 8 bits/byte = 640 M bits/sec.
--
Thomas Giger
Thomas Giger
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
tma wrote:sirhc wrote:HUH, WHAT???? No, it is not taking 1.2 MILLION pause frames per second? Maybe total for the day not per second, simply not mathematically possible on a functioning segment.
It seems mathematically possible to me: 1.25 M packets/sec * 64 bytes/packet * 8 bits/byte = 640 M bits/sec.
Well sure if you want to believe math.
Interesting....
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Dropping ports on new WS, what is wrong with my setup?
You know, something has been rattling around in my head, you know how the AF radios fill the wireless pipe to capacity with STUFF even when they are not passing real traffic? I complained about this practice way back on the UBNT forum when the AF5 was announced. I complained that the noise would be there 24/7 even on idle links.
What if the AF radios are stuffing the pipe to fill it up and the modulation steps down for whatever reason, noise, rain fade, what ever what happens? Does it maybe make a mistake and asks for a Pause Frame?
I know, grasping at straws here.
Another hair brain idea is what address are the AF radios sending the Pause Frame to, the MAC address of the switch port or the special multi-cast address reserved for switch packets?
So many things here could come into play.
What really needs to happen is on a switch/AF that is exhibiting the problem we really need a computer in the middle with bridged NICs running Wire Shark and examine the packets. I hope this is what UBNT is doing, if they bounce it back to us this is what I will do but first I have to re-create a setup that exhibits the problem as my current network topography does not cause the event.
But if what Chuck said they send a Tx Pause frame for every packet received when their buffers are full what happens if they think there is an issue, or maybe the wireless link is stuck and not passing traffic then the Ethernet side is full.
Keep in mind they are running a software driven switch/bridge not a core.
What if the AF radios are stuffing the pipe to fill it up and the modulation steps down for whatever reason, noise, rain fade, what ever what happens? Does it maybe make a mistake and asks for a Pause Frame?
I know, grasping at straws here.
Another hair brain idea is what address are the AF radios sending the Pause Frame to, the MAC address of the switch port or the special multi-cast address reserved for switch packets?
So many things here could come into play.
What really needs to happen is on a switch/AF that is exhibiting the problem we really need a computer in the middle with bridged NICs running Wire Shark and examine the packets. I hope this is what UBNT is doing, if they bounce it back to us this is what I will do but first I have to re-create a setup that exhibits the problem as my current network topography does not cause the event.
But if what Chuck said they send a Tx Pause frame for every packet received when their buffers are full what happens if they think there is an issue, or maybe the wireless link is stuck and not passing traffic then the Ethernet side is full.
Keep in mind they are running a software driven switch/bridge not a core.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Who is online
Users browsing this forum: No registered users and 27 guests