LAGs and RSTP may cause broadcast storm

DOWNLOAD THE LATEST FIRMWARE HERE
User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

LAGs and RSTP may cause broadcast storm

Wed Nov 25, 2015 3:45 pm

In an attempt to maximize redundancy (as best as you can with devices that have one Ethernet port only), we build all the major basestations with 2 routers and 4 switches. The 4 switches are connected as shown in this drawing:

Image

Each of the blue/green, red/yellow and grey/black pairs are in a LAG and the LAGs form a loop over all switches which obviously needs RSTP to prevent packet storms. We use multiple VLANs to partition the switches into (usually) 6 broadcast domains. Each LAG is a VLAN trunk for all of them.

In the lab setup tested now, S1/S2 are DLINK DGS-1210-24 switches with latest firmware and S3/S4 are Netonix WS-24-400A with firmware 1.3.7. Yes, I know 1.3.8 is available now, but it was not when I started this setup and there isn't anything in the release notes that would indicate a fix for the problems I will report. Also, please excuse that the drawing shows connections to ports 25-28 which don't exist on the WS-24-400A of course but just imagine these cables are connected to ports 21-24 instead.

So we got the RSTP ring

S1.23===S3.23 S3.21===S4.21 S4.23===S2:23 S2.21===S1.21
S1.24===S3.24 S3.22===S4.22 S4.24===S2:24 S2.22===S2.22

where the 2 lines are LAG'ed with LACP/A on all ports.

S1 is the STP root bridge with priority 8192.
If S1 fails, S2 is supposed to become the root bridge with priority 16384.
S3 and S4 have priority 32768. "STP on LAGs" is enabled.

Problem #1:

With everything in RSTP left at its defaults, the loop would be prevented by RSTP like this:

S1.23===S3.23 S3.21===S4.21 S4.23<--S2:23 S2.21===S1.21
S1.24===S3.24 S3.22===S4.22 S4.24<--S2:24 S2.22===S2.22

That's not too bad, but (for reasons of where the traffic is expected) I want it to be like this

S1.23===S3.23 S3.21-->S4.21 S4.23===S2:23 S2.21===S1.21
S1.24===S3.24 S3.22-->S4.22 S4.24===S2:24 S2.22===S2.22

The appropriate thing to enforce this would be to add path cost on S4.21 and S4.22.

Image

As the screenshot on the S4 Netonix shows, S4 selects the path to S3 as the root path, even if I add significant path cost to prevent right that. The DLINKs would have 19900 cost for GbE-links. So 50000 should be enough, and I even tried with 200000 and probably more. The only way to get traffic out on S4.23 and S4.24 is by unplugging the LAG cable pair between S3 and S4.

But be careful as you are trying that, as this is how I discovered problem #2.

Problem #2:

In this setup, when I unplug the cable at S3:21 (and only this one) like this

S1.23===S3.23 S3.21 /// S4.21 S4.23===S2:23 S2.21===S1.21
S1.24===S3.24 S3.22===S4.22 S4.24===S2:24 S2.22===S2.22

a storm will start if there is at least one other port connected (not shown here) where packets are injected. The storm can be terminated by reconnecting S3 and S4 fully, i.e. with both LAG legs, or by unplugging the remaining leg.

Obviously, something is wrong with your LAG/RSTP implementation (and actually, as reported in a previous post but still unproven, I believe it is the LAG part). Do I hear you say it is the DLINK's fault? So take the DLINKs out of the picture, doing this (S3 and S4 are the Netonixes)

S4.23===S3.23 S3.21===S4.21
S4.24===S3.24 S3.22===S4.22

which works okay as expected, but as soon as one leg of a LAG is unplugged like this

S4.23===S3.23 S3.21 /// S4.21
S4.24===S3.24 S3.22===S4.22

the storm starts.

You might argue that this may depend on how I used the VLANs on these LAGs - but as a matter of fact it works well as long as both legs of all LAGs are connected and loosing one leg should never cause a storm to develop. You can believe me that the LAG ports have all T's on all VLANs and trunking 1-4096 is enabled.

Actually, I do have a problem #3, but I fear this would get us in the static LAG vs. LACP LAG debate again. So I will put this, if at all, into a separate thread.
--
Thomas Giger

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: LAGs and RSTP may cause broadcast storm

Thu Nov 26, 2015 9:53 pm

Chris, I hope you've read this one. THIS ONE is the showstopper for me (the other threads about LAG and RSTP are not). This one has LACP and RSTP enabled as you recommend it and it should be fairly easy to reproduce my setup and the heavy storm that can be created. I would also send you my config files by PM.
--
Thomas Giger

User avatar
Eric Stern
Employee
Employee
 
Posts: 532
Joined: Wed Apr 09, 2014 9:41 pm
Location: Toronto, Ontario
Has thanked: 0 time
Been thanked: 130 times

Re: LAGs and RSTP may cause broadcast storm

Tue Dec 01, 2015 7:45 pm

Confirmed. When one of the LAG ports goes down the remaining one is incorrectly set to a forwarding state. RSTP will eventually correct this but not until the broadcast storm has occurred for several seconds.

This will be fixed in 1.3.9.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: LAGs and RSTP may cause broadcast storm

Wed Dec 02, 2015 3:47 am

Hi Eric, thank you for verifying and fixing the storm thing. My lab setup is still set up and I will verify it when 1.3.9 final or a RC mentions the fix.

Have you been able to reproduce/fix the path-cost-problem too?
--
Thomas Giger

User avatar
Eric Stern
Employee
Employee
 
Posts: 532
Joined: Wed Apr 09, 2014 9:41 pm
Location: Toronto, Ontario
Has thanked: 0 time
Been thanked: 130 times

Re: LAGs and RSTP may cause broadcast storm

Wed Dec 02, 2015 5:51 pm

I was able to get to that today.

It wasn't working because you were configuring the path cost for the port(s), not the LAG. But the configuration for the LAG was not exposed, it is hard coded to defaults.

I changed it to use the config from the port(s) as the config for the LAG. The way you had it configured should work now.

This fix will be in 1.3.9rc6.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: LAGs and RSTP may cause broadcast storm

Wed Dec 02, 2015 5:57 pm

Again, thank you very much!
--
Thomas Giger

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: LAGs and RSTP may cause broadcast storm

Thu Dec 03, 2015 5:11 am

Just wanted to verify what you said while I am still on 1.3.7 (before verifying the fix on 1.3.9rc6):

Eric Stern wrote:Confirmed. When one of the LAG ports goes down the remaining one is incorrectly set to a forwarding state. RSTP will eventually correct this but not until the broadcast storm has occurred for several seconds.


Just FYI: In my lab setup the broadcast storm will occur forever, where "forever" means at least 3 minutes :-). Maybe the storm is so strong that it blocks RSTP packets.
--
Thomas Giger

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: LAGs and RSTP may cause broadcast storm

Thu Dec 03, 2015 6:17 am

Upgraded my lab setup to 1.3.9rc6 and now I can confirm that both the "Broadcast storm when using LACP and RSTP" and "Unable to change RSTP path cost for LAGs" problems are solved in 1.3.9rc6.

Just two questions to Eric: With what value do you consider the RSTP path cost of a LAG? Is it one of the values entered (e.g. the first leg of the LAG, or the lowest cost, or ...) or do you compute a common path cost from all the LAGs, e.g. as if the path costs were resistors in parallel? And: Would the path cost change if one of LAG legs is broken?

Personally, I would favor the "resistors in parallel" method because that would allow for dynamic decisions like "if one of the LAG legs fails, rearrange the STP topology to use the other path with a fully functioning LAG".

Anyways ... thanks again for the quick fix.
--
Thomas Giger

User avatar
Eric Stern
Employee
Employee
 
Posts: 532
Joined: Wed Apr 09, 2014 9:41 pm
Location: Toronto, Ontario
Has thanked: 0 time
Been thanked: 130 times

Re: LAGs and RSTP may cause broadcast storm

Thu Dec 03, 2015 1:48 pm

tma wrote:Upgraded my lab setup to 1.3.9rc6 and now I can confirm that both the "Broadcast storm when using LACP and RSTP" and "Unable to change RSTP path cost for LAGs" problems are solved in 1.3.9rc6.

Just two questions to Eric: With what value do you consider the RSTP path cost of a LAG? Is it one of the values entered (e.g. the first leg of the LAG, or the lowest cost, or ...) or do you compute a common path cost from all the LAGs, e.g. as if the path costs were resistors in parallel? And: Would the path cost change if one of LAG legs is broken?


If you specify a path cost, it just uses that value. If you leave it blank (auto) it calculates the path cost based on the number of currently active ports in the LAG.

Return to Hardware and software issues

Who is online

Users browsing this forum: No registered users and 49 guests