After having a switch go bonkers on me today (I'm sure self inflicted), I need to get a better understanding of why and I say this humbly since it is my own doing.
I have a WS-8-150 DC at the top of our tower powering 3 ePMP APs. It is connected to the bottom of the tower (a WS-12-250 AC) via 2 fiber pairs (for redundancy).
On the top of the tower, they are into SFPs in ports 7 & 8. At the bottom, they are in SPFs in ports 13 & 14.
I thought I had STP set up where if one fiber line failed for whatever reason, the other would take over. Today, I got a bunch of STP errors in my log and suddenly no traffic was being routed up the tower. Again, this is my doing.
I have attached screen shots of LAG and STP of both switches. It's not right but I don't know why. And, should I use LAG over STP for simple double pair redundancy up a tower??
Top of the tower:
Bottom of the tower:
I'm much better at reverse engineering a correct setup...
I wish I was smarter (LAG vs STP)
-
mike99 - Associate
- Posts: 837
- Joined: Tue Nov 25, 2014 10:53 am
- Location: Quebec, Canada
- Has thanked: 95 times
- Been thanked: 245 times
Re: I wish I was smarter (LAG vs STP)
Lag = load balancing + fail over between 2 device
STP = redundant links over multiple devices
If it's for fail over between 2 devices, I would use LAG. Can't currently check config since now on my phone but maybe afternoon of no other answer come before.
STP = redundant links over multiple devices
If it's for fail over between 2 devices, I would use LAG. Can't currently check config since now on my phone but maybe afternoon of no other answer come before.
-
cwachs - Experienced Member
- Posts: 115
- Joined: Fri Nov 06, 2015 9:04 pm
- Location: Colorado
- Has thanked: 2 times
- Been thanked: 10 times
Re: I wish I was smarter (LAG vs STP)
I have moved that redundant fiber connection to a LAG. Working fine right now. It looks like my problem yesterday started with this:
Apr 5 15:02:53 STP: MSTI0: New root on port 3, root path cost is 40000
Port 3 is an AP and it suddenly became root and right after that, all h*ll broke loose on that switch:
Apr 5 15:08:05 STP: MSTI0: New root on port 3, root path cost is 20000
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 80000
Apr 5 15:08:05 STP: set port 7 to discarding
Apr 5 15:08:05 STP: set port 2 to discarding
Apr 5 15:08:05 STP: set port 3 to discarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 20000
Apr 5 15:08:05 STP: MSTI0: New root on port 25, root path cost is 20000
Apr 5 15:08:05 STP: set port 8 to discarding
Apr 5 15:08:05 STP: set port 7 to learning
Apr 5 15:08:05 STP: set port 7 to forwarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 120000
Apr 5 15:08:05 STP: set port 8 to learning
Apr 5 15:08:05 STP: set port 8 to forwarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 20000
Apr 5 15:08:05 STP: set port 7 to discarding
Apr 5 15:08:05 STP: MSTI0: New root on port 25, root path cost is 20000
This is an 8 port switch so I am guessing port 25 and 26 mean something else. Port 7 and 8 are the fiber lines feeding the switch from the bottom of the tower (from another Netonix). They were not in a LAG yesterday. As soon as I shut down one of those fiber lines from the bottom switch, I regained control of the top switch. Though port 3 (the AP) is still root - which seems not right.
Apr 5 15:02:53 STP: MSTI0: New root on port 3, root path cost is 40000
Port 3 is an AP and it suddenly became root and right after that, all h*ll broke loose on that switch:
Apr 5 15:08:05 STP: MSTI0: New root on port 3, root path cost is 20000
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 80000
Apr 5 15:08:05 STP: set port 7 to discarding
Apr 5 15:08:05 STP: set port 2 to discarding
Apr 5 15:08:05 STP: set port 3 to discarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 20000
Apr 5 15:08:05 STP: MSTI0: New root on port 25, root path cost is 20000
Apr 5 15:08:05 STP: set port 8 to discarding
Apr 5 15:08:05 STP: set port 7 to learning
Apr 5 15:08:05 STP: set port 7 to forwarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 120000
Apr 5 15:08:05 STP: set port 8 to learning
Apr 5 15:08:05 STP: set port 8 to forwarding
Apr 5 15:08:05 STP: MSTI0: New root on port 26, root path cost is 20000
Apr 5 15:08:05 STP: set port 7 to discarding
Apr 5 15:08:05 STP: MSTI0: New root on port 25, root path cost is 20000
This is an 8 port switch so I am guessing port 25 and 26 mean something else. Port 7 and 8 are the fiber lines feeding the switch from the bottom of the tower (from another Netonix). They were not in a LAG yesterday. As soon as I shut down one of those fiber lines from the bottom switch, I regained control of the top switch. Though port 3 (the AP) is still root - which seems not right.
-
cwachs - Experienced Member
- Posts: 115
- Joined: Fri Nov 06, 2015 9:04 pm
- Location: Colorado
- Has thanked: 2 times
- Been thanked: 10 times
Re: I wish I was smarter (LAG vs STP)
Well, as I dig deeper, at the same exact time of day (15:02), a few of my switches at different points in the network all got a new root... We are a hub and spoke network so the only connection point between these different switches is our NOC.
-
mike99 - Associate
- Posts: 837
- Joined: Tue Nov 25, 2014 10:53 am
- Location: Quebec, Canada
- Has thanked: 95 times
- Been thanked: 245 times
Re: I wish I was smarter (LAG vs STP)
25 and 26 are SFP ports. It could be that the core use is always the 26 ports or maybe for making programing easier (SFP 25 and 26 for every model so less condtions on programming).
Port 3 AP is to connect customer or backhaul ? If for customers, disable STP on this port else customer device could mess with STP on your network. STP should be enable only on trusted port you control all devices connected to. Also disable SFP on wireless link like in ubnt radio, it will probably only cause you pain.
Port 3 AP is to connect customer or backhaul ? If for customers, disable STP on this port else customer device could mess with STP on your network. STP should be enable only on trusted port you control all devices connected to. Also disable SFP on wireless link like in ubnt radio, it will probably only cause you pain.
-
cwachs - Experienced Member
- Posts: 115
- Joined: Fri Nov 06, 2015 9:04 pm
- Location: Colorado
- Has thanked: 2 times
- Been thanked: 10 times
Re: I wish I was smarter (LAG vs STP)
Port 3 is an AP with customers on the other end of it.
Life just got worse for me. I turned off the POE power to the AP that was on port 3 (which was set as root). I needed to change out the AP due to a bad GPS chip. As soon as I powered down that AP, I lost all access to the switch. It vanished off the network. I have power cycled it 3 times and it does not come back up on the network. The switch at the bottom of the tower sends packets up the fiber but nothing comes back down.
Since this is a DC powered switch at the top of the tower, it is not easy to trouble shoot. Luckily, I never took down my old Ethernet lines that run up the tower so I got 2 of my 3 APs back online on copper.
Can this be related to STP or is my switch got something else going on? And I thought way back when I watched "WISP Switch the Movie", STP was advised to be on all ports, including customer facing APs...
Life just got worse for me. I turned off the POE power to the AP that was on port 3 (which was set as root). I needed to change out the AP due to a bad GPS chip. As soon as I powered down that AP, I lost all access to the switch. It vanished off the network. I have power cycled it 3 times and it does not come back up on the network. The switch at the bottom of the tower sends packets up the fiber but nothing comes back down.
Since this is a DC powered switch at the top of the tower, it is not easy to trouble shoot. Luckily, I never took down my old Ethernet lines that run up the tower so I got 2 of my 3 APs back online on copper.
Can this be related to STP or is my switch got something else going on? And I thought way back when I watched "WISP Switch the Movie", STP was advised to be on all ports, including customer facing APs...
-
cwachs - Experienced Member
- Posts: 115
- Joined: Fri Nov 06, 2015 9:04 pm
- Location: Colorado
- Has thanked: 2 times
- Been thanked: 10 times
Lost access to a switch - STP related?
Moving topic https://forum.netonix.com/viewtopic.php?f=6&t=2656&p=18546#p18546to this board now that I have a failure and am in need of some support advice.
This is a DC switch on the top of a tower powering 3 ePMP APs. It is connected to a switch at the bottom over 2 fiber paths (in a LAG). We were battling some apparent STP issues yesterday and today. During that time, port 3 (an ePMP AP) became designated as "ROOT" on the switch. We powered down the POE to that AP on port 3. As soon as we did that, we lost all access to the switch.
We have power cycled the switch a couple times. The switch at the bottom shows link for both fiber ports in the LAG and it is sending packets up the fiber but nothing is coming back down. The management IP of the switch is static. Nothing shows up in the MAC table for the fiber ports at the bottom. Switch on the tower is running 1.4.7rc14 and is a WS-8-150-DC.
Question 1: Is there any way to regain control of this switch short of a hard reset?
Question 2: STP related? The fact port 3 got designated as ROOT and then we powered down that port cause this?
Question 3: Should STP be enabled on ports connected to APs (PtMP APs) serving customers?
From what I know about STP, when the switch reboots, it should determine patch cost and roles so rebooting it should shake it free from port 3 thinking it is root or the fiber LAG as being NDP - which is the state it appears to be in?
This is a DC switch on the top of a tower powering 3 ePMP APs. It is connected to a switch at the bottom over 2 fiber paths (in a LAG). We were battling some apparent STP issues yesterday and today. During that time, port 3 (an ePMP AP) became designated as "ROOT" on the switch. We powered down the POE to that AP on port 3. As soon as we did that, we lost all access to the switch.
We have power cycled the switch a couple times. The switch at the bottom shows link for both fiber ports in the LAG and it is sending packets up the fiber but nothing is coming back down. The management IP of the switch is static. Nothing shows up in the MAC table for the fiber ports at the bottom. Switch on the tower is running 1.4.7rc14 and is a WS-8-150-DC.
Question 1: Is there any way to regain control of this switch short of a hard reset?
Question 2: STP related? The fact port 3 got designated as ROOT and then we powered down that port cause this?
Question 3: Should STP be enabled on ports connected to APs (PtMP APs) serving customers?
From what I know about STP, when the switch reboots, it should determine patch cost and roles so rebooting it should shake it free from port 3 thinking it is root or the fiber LAG as being NDP - which is the state it appears to be in?
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: I wish I was smarter (LAG vs STP)
So first off I am not sure why you did not play with LACP in the LAB before implementing it in service???
Anyway your screenshots are NOT correct
You have the LACP Key set at the TOP correctly but the LACP ports are NOT ENABLED?????
You do NOT have the LACP Key set at the BOTTOM at all and the LACP ports are NOT ENABLED?????
Might I suggest you read up on LACP and then play with 2 units in a LAB environment before implementing LIVE?
Anyway your screenshots are NOT correct
You have the LACP Key set at the TOP correctly but the LACP ports are NOT ENABLED?????
You do NOT have the LACP Key set at the BOTTOM at all and the LACP ports are NOT ENABLED?????
Might I suggest you read up on LACP and then play with 2 units in a LAB environment before implementing LIVE?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: I wish I was smarter (LAG vs STP)
cwachs wrote:Moving topic https://forum.netonix.com/viewtopic.php?f=6&t=2656&p=18546#p18546to this board now that I have a failure and am in need of some support advice.
This is a DC switch on the top of a tower powering 3 ePMP APs. It is connected to a switch at the bottom over 2 fiber paths (in a LAG). We were battling some apparent STP issues yesterday and today. During that time, port 3 (an ePMP AP) became designated as "ROOT" on the switch. We powered down the POE to that AP on port 3. As soon as we did that, we lost all access to the switch.
We have power cycled the switch a couple times. The switch at the bottom shows link for both fiber ports in the LAG and it is sending packets up the fiber but nothing is coming back down. The management IP of the switch is static. Nothing shows up in the MAC table for the fiber ports at the bottom. Switch on the tower is running 1.4.7rc14 and is a WS-8-150-DC.
Question 1: Is there any way to regain control of this switch short of a hard reset?
Question 2: STP related? The fact port 3 got designated as ROOT and then we powered down that port cause this?
Question 3: Should STP be enabled on ports connected to APs (PtMP APs) serving customers?
From what I know about STP, when the switch reboots, it should determine patch cost and roles so rebooting it should shake it free from port 3 thinking it is root or the fiber LAG as being NDP - which is the state it appears to be in?
As far as gaining control of switch should be simple, use ONE of the fibers by unplugging the other.
You may need to power cycle them?
Your main problem occurred because your did not have LACP setup at all.
You only specified the Key on one end and you failed to enable LACP ports on both switches.
At this point you have a LOOP which RSTP was trying to deal with the loop but if all your switches have RSTP enabled then things would shift around when you unplug things and depending on your RSTP settings a new Root may be established.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
cwachs - Experienced Member
- Posts: 115
- Joined: Fri Nov 06, 2015 9:04 pm
- Location: Colorado
- Has thanked: 2 times
- Been thanked: 10 times
Re: I wish I was smarter (LAG vs STP)
We did have LACP setup and working - after those screen shots were sent. Key was 10 on both ends using an active LACP. Both ports on top and bottom were enabled. We tested it by dropping a fiber and the LACP functioned as it should. That was all post screen shot where we were not using LACP.
About 12 hours after putting the LACP into action, we lost the switch when we powered down a POE port at the top of the tower. That same POE port had been designated ROOT even though it was attached to an AP serving customers. None of the customers below it can have DHCP traffic coming upstream - or radios do not allow that.
We have power cycled the radio a couple times. I have turned off both of the fiber ports separately. We tried turning off the LACP at the bottom and shutting off one of the fiber ports. Nothing gets any packets to return from the top of the tower.
I have TFTP auto backup enabled so I have a copy of the latest working config just before and just after we turned off the POE power at the top switch.
About 12 hours after putting the LACP into action, we lost the switch when we powered down a POE port at the top of the tower. That same POE port had been designated ROOT even though it was attached to an AP serving customers. None of the customers below it can have DHCP traffic coming upstream - or radios do not allow that.
We have power cycled the radio a couple times. I have turned off both of the fiber ports separately. We tried turning off the LACP at the bottom and shutting off one of the fiber ports. Nothing gets any packets to return from the top of the tower.
I have TFTP auto backup enabled so I have a copy of the latest working config just before and just after we turned off the POE power at the top switch.
Who is online
Users browsing this forum: No registered users and 54 guests