Page 1 of 2
I wish I was smarter (LAG vs STP)
Posted: Wed Apr 05, 2017 8:34 pm
by cwachs
After having a switch go bonkers on me today (I'm sure self inflicted), I need to get a better understanding of why and I say this humbly since it is my own doing.
I have a WS-8-150 DC at the top of our tower powering 3 ePMP APs. It is connected to the bottom of the tower (a WS-12-250 AC) via 2 fiber pairs (for redundancy).
On the top of the tower, they are into SFPs in ports 7 & 8. At the bottom, they are in SPFs in ports 13 & 14.
I
thought I had STP set up where if one fiber line failed for whatever reason, the other would take over. Today, I got a bunch of STP errors in my log and suddenly no traffic was being routed up the tower. Again, this is my doing.
I have attached screen shots of LAG and STP of both switches. It's not right but I don't know why. And, should I use LAG over STP for simple double pair redundancy up a tower??
Top of the tower:
Bottom of the tower:
I'm much better at reverse engineering a correct setup...
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 10:16 am
by mike99
Lag = load balancing + fail over between 2 device
STP = redundant links over multiple devices
If it's for fail over between 2 devices, I would use LAG. Can't currently check config since now on my phone but maybe afternoon of no other answer come before.
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 10:22 am
by cwachs
I have moved that redundant fiber connection to a LAG. Working fine right now. It looks like my problem yesterday started with this:
Apr 5 15:02:53 STP: MSTI0: New root on port 3, root path cost is 40000
Port 3 is an AP and it suddenly became root and right after that, all h*ll broke loose on that switch:
Apr 5 15:08:05 STP: MSTI0: New root on port 3, root path cost is 20000
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 80000
Apr 5 15:08:05 STP: set port 7 to discarding
Apr 5 15:08:05 STP: set port 2 to discarding
Apr 5 15:08:05 STP: set port 3 to discarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 20000
Apr 5 15:08:05 STP: MSTI0: New root on port 25, root path cost is 20000
Apr 5 15:08:05 STP: set port 8 to discarding
Apr 5 15:08:05 STP: set port 7 to learning
Apr 5 15:08:05 STP: set port 7 to forwarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 120000
Apr 5 15:08:05 STP: set port 8 to learning
Apr 5 15:08:05 STP: set port 8 to forwarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 20000
Apr 5 15:08:05 STP: set port 7 to discarding
Apr 5 15:08:05 STP: MSTI0: New root on port 25, root path cost is 20000
This is an 8 port switch so I am guessing port 25 and 26 mean something else. Port 7 and 8 are the fiber lines feeding the switch from the bottom of the tower (from another Netonix). They were not in a LAG yesterday. As soon as I shut down one of those fiber lines from the bottom switch, I regained control of the top switch. Though port 3 (the AP) is still root - which seems not right.
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 10:50 am
by cwachs
Well, as I dig deeper, at the same exact time of day (15:02), a few of my switches at different points in the network all got a new root... We are a hub and spoke network so the only connection point between these different switches is our NOC.
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 3:06 pm
by mike99
25 and 26 are SFP ports. It could be that the core use is always the 26 ports or maybe for making programing easier (SFP 25 and 26 for every model so less condtions on programming).
Port 3 AP is to connect customer or backhaul ? If for customers, disable STP on this port else customer device could mess with STP on your network. STP should be enable only on trusted port you control all devices connected to. Also disable SFP on wireless link like in ubnt radio, it will probably only cause you pain.
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 3:35 pm
by cwachs
Port 3 is an AP with customers on the other end of it.
Life just got worse for me. I turned off the POE power to the AP that was on port 3 (which was set as root). I needed to change out the AP due to a bad GPS chip. As soon as I powered down that AP, I lost all access to the switch. It vanished off the network. I have power cycled it 3 times and it does not come back up on the network. The switch at the bottom of the tower sends packets up the fiber but nothing comes back down.
Since this is a DC powered switch at the top of the tower, it is not easy to trouble shoot. Luckily, I never took down my old Ethernet lines that run up the tower so I got 2 of my 3 APs back online on copper.
Can this be related to STP or is my switch got something else going on? And I thought way back when I watched "WISP Switch the Movie", STP was advised to be on all ports, including customer facing APs...
Lost access to a switch - STP related?
Posted: Thu Apr 06, 2017 4:26 pm
by cwachs
Moving topic
https://forum.netonix.com/viewtopic.php?f=6&t=2656&p=18546#p18546to this board now that I have a failure and am in need of some support advice.
This is a DC switch on the top of a tower powering 3 ePMP APs. It is connected to a switch at the bottom over 2 fiber paths (in a LAG). We were battling some apparent STP issues yesterday and today. During that time, port 3 (an ePMP AP) became designated as "ROOT" on the switch. We powered down the POE to that AP on port 3. As soon as we did that, we lost all access to the switch.
We have power cycled the switch a couple times. The switch at the bottom shows link for both fiber ports in the LAG and it is sending packets up the fiber but nothing is coming back down. The management IP of the switch is static. Nothing shows up in the MAC table for the fiber ports at the bottom. Switch on the tower is running 1.4.7rc14 and is a WS-8-150-DC.
Question 1: Is there any way to regain control of this switch short of a hard reset?
Question 2: STP related? The fact port 3 got designated as ROOT and then we powered down that port cause this?
Question 3: Should STP be enabled on ports connected to APs (PtMP APs) serving customers?
From what I know about STP, when the switch reboots, it should determine patch cost and roles so rebooting it should shake it free from port 3 thinking it is root or the fiber LAG as being NDP - which is the state it appears to be in?
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 4:53 pm
by sirhc
So first off I am not sure why you did not play with LACP in the LAB before implementing it in service???
Anyway your screenshots are NOT correct
You have the LACP Key set at the TOP correctly but the LACP ports are NOT ENABLED?????
You do NOT have the LACP Key set at the BOTTOM at all and the LACP ports are NOT ENABLED?????
Might I suggest you read up on LACP and then play with 2 units in a LAB environment before implementing LIVE?
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 5:01 pm
by sirhc
cwachs wrote:Moving topic
https://forum.netonix.com/viewtopic.php?f=6&t=2656&p=18546#p18546to this board now that I have a failure and am in need of some support advice.
This is a DC switch on the top of a tower powering 3 ePMP APs. It is connected to a switch at the bottom over 2 fiber paths (in a LAG). We were battling some apparent STP issues yesterday and today. During that time, port 3 (an ePMP AP) became designated as "ROOT" on the switch. We powered down the POE to that AP on port 3. As soon as we did that, we lost all access to the switch.
We have power cycled the switch a couple times. The switch at the bottom shows link for both fiber ports in the LAG and it is sending packets up the fiber but nothing is coming back down. The management IP of the switch is static. Nothing shows up in the MAC table for the fiber ports at the bottom. Switch on the tower is running 1.4.7rc14 and is a WS-8-150-DC.
Question 1: Is there any way to regain control of this switch short of a hard reset?
Question 2: STP related? The fact port 3 got designated as ROOT and then we powered down that port cause this?
Question 3: Should STP be enabled on ports connected to APs (PtMP APs) serving customers?
From what I know about STP, when the switch reboots, it should determine patch cost and roles so rebooting it should shake it free from port 3 thinking it is root or the fiber LAG as being NDP - which is the state it appears to be in?
As far as gaining control of switch should be simple, use ONE of the fibers by unplugging the other.
You may need to power cycle them?
Your main problem occurred because your did not have LACP setup at all.
You only specified the Key on one end and you failed to enable LACP ports on both switches.
At this point you have a LOOP which RSTP was trying to deal with the loop but if all your switches have RSTP enabled then things would shift around when you unplug things and depending on your RSTP settings a new Root may be established.
Re: I wish I was smarter (LAG vs STP)
Posted: Thu Apr 06, 2017 5:12 pm
by cwachs
We did have LACP setup and working - after those screen shots were sent. Key was 10 on both ends using an active LACP. Both ports on top and bottom were enabled. We tested it by dropping a fiber and the LACP functioned as it should. That was all post screen shot where we were not using LACP.
About 12 hours after putting the LACP into action, we lost the switch when we powered down a POE port at the top of the tower. That same POE port had been designated ROOT even though it was attached to an AP serving customers. None of the customers below it can have DHCP traffic coming upstream - or radios do not allow that.
We have power cycled the radio a couple times. I have turned off both of the fiber ports separately. We tried turning off the LACP at the bottom and shutting off one of the fiber ports. Nothing gets any packets to return from the top of the tower.
I have TFTP auto backup enabled so I have a copy of the latest working config just before and just after we turned off the POE power at the top switch.