Page 1 of 1

LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 9:13 am
by giannici
I have two links aggregated with LAG.
Often, when one of the link goes down (or is very bad), a loop is detected and the entire link is put down!
Even a mikrotik in the network logs a message that a loop has been seen. So it's NOT a bug in the detection but in the implementation of LAG!

Today, I had to disable the port of one of the aggregated links and the loop happened and the link was put down for 3 minutes!



Code: Select all
Nov 20 12:17:04 UI: Configuration changed by root (10.10.0.9)
Nov 20 12:17:04 UI: LACP 3 Enable: changed from 'Enabled' to 'Disabled'
Nov 20 12:17:07 LACP: LACP changed state to Not active on port 3 (key 10)
Nov 20 12:17:07 switch[355]: LACP changed state to Not active on port 3 (BH Cicero (AF5X) DATA) (key 10)
Nov 20 12:17:31 UI: Configuration auto backup successful
Nov 20 12:17:47 Loop protection: detected loop from port 3 to port 2, disabling port 2 for 180 seconds


Here is LAG configuration:

Image

The firmware has been updated to 1.4.5 but with no effect.

Is this a know bug?
Am I the only one to see it?

Thanks.

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 10:23 am
by sirhc
So I have LACP lags between 2 WS switches and I do not see this issue.
Using cables between switch ports.

I also have a ton of STATIC LAGs between WS and Cisco Routers and do not see this.
Using cables between switch ports and router ports.

Explain what your LACP LAG is between and what the medium is between your devices. Maybe a diagram.

You can disable Loop Protection on the Device/Configuration Tab but the fact that the MT is also seeing the Loop tells me there is a loop somewhere.

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 12:26 pm
by giannici
Here it is a diagram of the situation:

Image
So, there are 3 Netonix's Switches.
The connection between SW1 and SW2, and SW2 and SW3 is done with link aggregations.
All these links are done with AirFibers.
Each AirFiber is connected to the switch with both the DATA port (the aggregated port) and the MNG port.
Each AirFiber has the "In-Band Management" option disabled.
The two LGAs use two different Keys, Key 10 for link SW1-SW2 and Key 2 for link SW2-SW3.

Is this all OK?
Is there something that I'm missing?

Thanks

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 12:44 pm
by sirhc
Well personally I would not use LAGs across a wireless medium. LAGs are unaware of the wireless variable link capacity beyond the physical link state which means the switches think that each link is capable of 1G Full Duplex as that is the physical port link state.

Not only that I do not know if the radio bridge will properly pass the special packets needed for LACP to work as those packets are not supposed to pass through an Ethernet interface but rather they stop at the first interface they hit so I am not sure.

Since we have tested LAGs extensively with using cables and they work just fine I would have to guess the issues is with the AFX radios somehow bleeding the BPDU packets across the wireless links or not passing the special LACP control packets correctly?

As I said you can turn off Loop Protection but the big red flag here is that your MT router is also seeing a loop?

viewtopic.php?f=6&t=887&p=6705&hilit=+LACP+wireless#p6705
viewtopic.php?f=17&t=764&p=5675&hilit=+LACP+wireless#p5675
viewtopic.php?f=6&t=473&p=3112#p3112
viewtopic.php?f=6&t=739&p=5475#p5475

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 1:33 pm
by giannici
Well, a few days ago I already tried to disable the loop protection. The result was a complete down of the entire network!!!
So, THERE IS a loop when one of the links go down...

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 1:35 pm
by giannici
And I think the AirFiber let pass the control packets because the LACP links go Active/Inactive as needed.

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 1:38 pm
by giannici
And, as I said in the initial message, the loop occurred also when I simply set "Disabled" the Link3 port in the SW3 switch.

I disabled Port 3 and the switch said "Loop protection: detected loop from port 3 to port 2, disabling port 2 for 180 seconds"


So, there must be some problem somewhere...

Re: LAG causes loop when link goes down

Posted: Sun Nov 20, 2016 1:46 pm
by sirhc
Not sure what to say.

My guess is the AF radios have something to do with it?

I setup a lab with 3 switches using LACP as shown above using just cables in place of AF links and I can not get this to happen?

Maybe the AF radios are bleeding packets across the management port?

Maybe you can setup a LAB and try an re-create it with just cables as maybe I am missing something your doing?

I would also enable RSTP on the switch and all ports as it may provide better information as to where the loop is happen? I always suggest using RSTP with LAGs.