LAGs reloaded, static and with LACP
Posted: Thu Nov 26, 2015 6:48 pm
While I was trying to set up the constellation pictured in my post about "LAGs and RSTP may cause broadcast storm" I failed to make my setup work with static LAGs. It worked when I changed it all to LACP LAGs. Of course, this seems to prove what Chris said (he said: use LACP) but I want to get my point documented anyway.
Before I go into the details, I'd like to emphasize that I really like the Netonix switches - or I wouldn't do this testing and reporting. I like them for their design tailored to passive PoE guys as we all are and the many features not available if I had to use standard 802.3af/at stuff. The software has been improved to the point where I would fully trust it in a single switch scenario and that's what we've started to use them for. I want them to work in our redundant setup on core sites, though, and that's why I'm testing LAGs and RSTP ...
Now about static vs. LACP'ed LAGs: Basically one could say that LACP only adds two features. It detects a failed leg when this is not obvious because something keeps the link state up - this could be a media converter or an Airfiber if you wanted to LAG to another site. Secondly, it would prevent a loop in case someone would plug in two cables between switches on ports that are not prepared to be in a LAG statically.
However, a loop must not occur on ports that are configured statically to be in a LAG, with or without STP, and LACP is also not necessary to redistribute traffic to the remaining legs of a LAG if one leg fails - static LAG will do that just as well, by definition and tested on Netonix by me ;-). Also LACP is not necessary to get switches into an agreement about the distribution algorithm used as this is solely the decision of each switch. So in case it's really only cables between the switches and they are plugged right, static LAG isn't worse than a LAG with LACP.
Back to the issue: When I set up my 4-switch-LAGed-ring-thing pictured in the other post, static loops would work if all switches are DLINK. When I replaced S3 and S4 to be Netonix, I would not be able to ping a switch unless I was directly connected to one of its ports. Actually, that's not the full thruth, as I was *sometimes* able to ping across a LAG but that never worked for a long time as if there was some MAC caching or ARP timeout involved. It didn't investigate further because when I modified all switches to use LACP, everything started to work (nothing else changed).
Now, like I was thinking to myself, you may say that whatever makes it work should be kept and that's it. However, by definition, LACP should not be required for what I wanted to achieve and in fact it is not if it's only between DLINKs. Well, take it or leave it - it's not a showstopper and I will have it LACP'ed to get the switches into production (once the loop thing is resolved).
Before I go into the details, I'd like to emphasize that I really like the Netonix switches - or I wouldn't do this testing and reporting. I like them for their design tailored to passive PoE guys as we all are and the many features not available if I had to use standard 802.3af/at stuff. The software has been improved to the point where I would fully trust it in a single switch scenario and that's what we've started to use them for. I want them to work in our redundant setup on core sites, though, and that's why I'm testing LAGs and RSTP ...
Now about static vs. LACP'ed LAGs: Basically one could say that LACP only adds two features. It detects a failed leg when this is not obvious because something keeps the link state up - this could be a media converter or an Airfiber if you wanted to LAG to another site. Secondly, it would prevent a loop in case someone would plug in two cables between switches on ports that are not prepared to be in a LAG statically.
However, a loop must not occur on ports that are configured statically to be in a LAG, with or without STP, and LACP is also not necessary to redistribute traffic to the remaining legs of a LAG if one leg fails - static LAG will do that just as well, by definition and tested on Netonix by me ;-). Also LACP is not necessary to get switches into an agreement about the distribution algorithm used as this is solely the decision of each switch. So in case it's really only cables between the switches and they are plugged right, static LAG isn't worse than a LAG with LACP.
Back to the issue: When I set up my 4-switch-LAGed-ring-thing pictured in the other post, static loops would work if all switches are DLINK. When I replaced S3 and S4 to be Netonix, I would not be able to ping a switch unless I was directly connected to one of its ports. Actually, that's not the full thruth, as I was *sometimes* able to ping across a LAG but that never worked for a long time as if there was some MAC caching or ARP timeout involved. It didn't investigate further because when I modified all switches to use LACP, everything started to work (nothing else changed).
Now, like I was thinking to myself, you may say that whatever makes it work should be kept and that's it. However, by definition, LACP should not be required for what I wanted to achieve and in fact it is not if it's only between DLINKs. Well, take it or leave it - it's not a showstopper and I will have it LACP'ed to get the switches into production (once the loop thing is resolved).