We run a bunch of WS-12-250-DCs in our network. We standardized on 1.5.1rc6 as it was GA at the time we rolled out, and honestly, these switches have been ROCK solid since! Most of them are installed in top-of-tower timing cabinets with PacketFlux gear to Cambium Canopy equipment. Most of our towers have few enough APs that a single 12-port unit is enough to do everything. We use fiber down the tower to get to whatever is at the base.
The tower base equipment is usually a Juniper switch or a small Cisco router. We run VSTP/Per-VLAN STP on the other gear. Our Netonixes are set up in RSTP mode with RSTP disabled on the port facing the tower base switch to avoid problems arising from having the different types of BPDUs in use.
Today we had a first, which was to add a second one of these switches to a tower. This tower is running a Juniper switch at the base with VSTP on all VLANs and ports. Like others, the Netonix at the top has RSTP disabled on the port going to the Juniper. RSTP on the box otherwise looked sane. The second WS-12-250-DC is basically a clone of the first. They are connected togeter on port 11. The first switch is RSTP root as it has a lower bridge ID.
Upon connecting the second switch, the two started going nuts:
- Code: Select all
Dec 31 19:02:51 Port: link state changed to 'up' (1G) on port 11
Dec 31 19:02:51 STP: msti 0 set port 11 to discarding
Dec 31 19:02:51 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Dec 31 19:02:52 STP: msti 0 set port 11 to learning
Dec 31 19:02:52 STP: msti 0 set port 11 to forwarding
Dec 31 19:02:57 system: starting ntpclient
Apr 15 11:24:43 system: time set by NTP server
Apr 15 11:25:08 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:25:19 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:25:30 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:26:04 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:27:10 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:28:05 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:29:14 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
The New Root messages continued on that way for literally hours until I found a fix.
Unfortunately syslog doesn't seem to be working on the first switch, so I have only the internal log to go by. I'm assuming that this was happening since the same time:
- Code: Select all
Apr 15 16:38:00 monitor: restarting vtss_appl
Apr 15 16:38:01 STP: msti 0 set port 1 to discarding
Apr 15 16:38:01 STP: msti 0 set port 2 to discarding
Apr 15 16:38:01 STP: msti 0 set port 3 to discarding
Apr 15 16:38:01 STP: msti 0 set port 4 to discarding
Apr 15 16:38:01 STP: msti 0 set port 6 to discarding
Apr 15 16:38:01 STP: msti 0 set port 7 to discarding
Apr 15 16:38:01 STP: msti 0 set port 8 to discarding
Apr 15 16:38:02 STP: msti 0 set port 9 to discarding
Apr 15 16:38:02 STP: msti 0 set port 10 to discarding
Apr 15 16:38:02 STP: msti 0 set port 11 to discarding
Apr 15 16:38:02 STP: msti 0 set port 13 to discarding
Apr 15 16:38:02 STP: msti 0 set port 11 to learning
Apr 15 16:38:02 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:04 Port: link state changed to 'down' on port 13
Apr 15 16:38:11 monitor: restarting vtss_appl
Apr 15 16:38:12 STP: msti 0 set port 1 to discarding
Apr 15 16:38:12 STP: msti 0 set port 2 to discarding
Apr 15 16:38:12 STP: msti 0 set port 3 to discarding
Apr 15 16:38:12 STP: msti 0 set port 4 to discarding
Apr 15 16:38:12 STP: msti 0 set port 6 to discarding
Apr 15 16:38:13 STP: msti 0 set port 7 to discarding
Apr 15 16:38:13 STP: msti 0 set port 8 to discarding
Apr 15 16:38:13 STP: msti 0 set port 9 to discarding
Apr 15 16:38:13 STP: msti 0 set port 10 to discarding
Apr 15 16:38:13 STP: msti 0 set port 11 to discarding
Apr 15 16:38:13 STP: msti 0 set port 13 to discarding
Apr 15 16:38:14 STP: msti 0 set port 11 to learning
Apr 15 16:38:14 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:15 Port: link state changed to 'down' on port 13
Apr 15 16:38:23 monitor: restarting vtss_appl
Apr 15 16:38:24 STP: msti 0 set port 1 to discarding
Apr 15 16:38:24 STP: msti 0 set port 2 to discarding
Apr 15 16:38:24 STP: msti 0 set port 3 to discarding
Apr 15 16:38:24 STP: msti 0 set port 4 to discarding
Apr 15 16:38:24 STP: msti 0 set port 6 to discarding
Apr 15 16:38:24 STP: msti 0 set port 7 to discarding
Apr 15 16:38:24 STP: msti 0 set port 8 to discarding
Apr 15 16:38:24 STP: msti 0 set port 9 to discarding
Apr 15 16:38:24 STP: msti 0 set port 10 to discarding
Apr 15 16:38:24 STP: msti 0 set port 11 to discarding
Apr 15 16:38:25 STP: msti 0 set port 13 to discarding
Apr 15 16:38:25 STP: msti 0 set port 11 to learning
Apr 15 16:38:25 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:26 Port: link state changed to 'down' on port 13
Apr 15 16:38:26 Port: link state changed to 'down' on port 14
Apr 15 16:38:34 monitor: restarting vtss_appl
Apr 15 16:38:35 STP: msti 0 set port 1 to discarding
Apr 15 16:38:35 STP: msti 0 set port 2 to discarding
Apr 15 16:38:35 STP: msti 0 set port 3 to discarding
Apr 15 16:38:35 STP: msti 0 set port 4 to discarding
Apr 15 16:38:35 STP: msti 0 set port 6 to discarding
Apr 15 16:38:35 STP: msti 0 set port 7 to discarding
Apr 15 16:38:35 STP: msti 0 set port 8 to discarding
Apr 15 16:38:35 STP: msti 0 set port 9 to discarding
Apr 15 16:38:36 STP: msti 0 set port 10 to discarding
Apr 15 16:38:36 STP: msti 0 set port 11 to discarding
Apr 15 16:38:36 STP: msti 0 set port 13 to discarding
Apr 15 16:38:36 STP: msti 0 set port 11 to learning
Apr 15 16:38:36 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:37 Port: link state changed to 'down' on port 13
...and so on. Apparently an endless loop of bouncing STP on all ports, vtss_appl restarting, and the SFP ports flapping.
I'd like to point out the flapping of the SFP ports, 13 and 14. 13 was a copper SFP connected to a Canopy AP that's been there for some time. I checked to make sure we didn't accidentally have a dual-homed business customer connected, and it turns out the AP actually has no subscribers on it. Port 14 is the uplink to the Juniper switch.
I actually misdiagnosed this as an SFP or cable issue from the Juniper's logs as we had a few other unforeseen issues during this maintenance activity. At 17:25 I changed the Juniper's config to disable VSTP on the port up to the Netonix entirely. I did this to get rid of the learning phase so our customers would weather the problem better, since there's no change of an L2 loop between the two switches.
This had no immediate effect. After 14 minutes, the flapping mysteriously stopped on switch #1
- Code: Select all
Apr 15 17:35:06 Port: link state changed to 'down' on port 13
Apr 15 17:35:06 Port: link state changed to 'down' on port 14
Apr 15 17:35:07 STP: msti 0 set port 10 to learning
Apr 15 17:35:07 Port: link state changed to 'up' (1G) on port 14
Apr 15 17:35:07 STP: msti 0 set port 10 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 9 to learning
Apr 15 17:35:07 STP: msti 0 set port 9 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 8 to learning
Apr 15 17:35:07 STP: msti 0 set port 8 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 7 to learning
Apr 15 17:35:07 STP: msti 0 set port 7 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 6 to learning
Apr 15 17:35:07 STP: msti 0 set port 6 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 4 to learning
Apr 15 17:35:08 STP: msti 0 set port 4 to forwarding
Apr 15 17:35:08 STP: msti 0 set port 3 to learning
Apr 15 17:35:08 STP: msti 0 set port 3 to forwarding
Apr 15 17:35:08 STP: msti 0 set port 2 to learning
Apr 15 17:35:08 STP: msti 0 set port 2 to forwarding
Apr 15 17:35:08 STP: msti 0 set port 1 to learning
Apr 15 17:35:08 STP: msti 0 set port 1 to forwarding
Apr 15 17:35:09 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:35:09 STP: msti 0 set port 13 to discarding
Apr 15 17:35:12 STP: msti 0 set port 13 to learning
Apr 15 17:35:12 STP: msti 0 set port 13 to forwarding
Apr 15 17:35:58 monitor: restarting vtss_appl
Apr 15 17:35:59 STP: msti 0 set port 1 to discarding
Apr 15 17:35:59 STP: msti 0 set port 2 to discarding
Apr 15 17:35:59 STP: msti 0 set port 3 to discarding
Apr 15 17:35:59 STP: msti 0 set port 4 to discarding
Apr 15 17:35:59 STP: msti 0 set port 6 to discarding
Apr 15 17:35:59 STP: msti 0 set port 7 to discarding
Apr 15 17:36:00 STP: msti 0 set port 8 to discarding
Apr 15 17:36:00 STP: msti 0 set port 9 to discarding
Apr 15 17:36:00 STP: msti 0 set port 10 to discarding
Apr 15 17:36:00 STP: msti 0 set port 11 to discarding
Apr 15 17:36:00 STP: msti 0 set port 13 to discarding
Apr 15 17:36:00 STP: msti 0 set port 11 to learning
Apr 15 17:36:00 STP: msti 0 set port 11 to forwarding
Apr 15 17:36:01 Port: link state changed to 'down' on port 13
Apr 15 17:36:02 STP: msti 0 set port 10 to learning
Apr 15 17:36:03 STP: msti 0 set port 10 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 9 to learning
Apr 15 17:36:03 STP: msti 0 set port 9 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 8 to learning
Apr 15 17:36:03 STP: msti 0 set port 8 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 7 to learning
Apr 15 17:36:03 STP: msti 0 set port 7 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 6 to learning
Apr 15 17:36:03 STP: msti 0 set port 6 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 4 to learning
Apr 15 17:36:03 STP: msti 0 set port 4 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 3 to learning
Apr 15 17:36:03 STP: msti 0 set port 3 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 2 to learning
Apr 15 17:36:03 STP: msti 0 set port 2 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 1 to learning
Apr 15 17:36:03 STP: msti 0 set port 1 to forwarding
Apr 15 17:36:04 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:36:04 STP: msti 0 set port 13 to discarding
Apr 15 17:36:07 STP: msti 0 set port 13 to learning
Apr 15 17:36:07 STP: msti 0 set port 13 to forwarding
Apr 15 17:37:05 monitor: restarting vtss_appl
Apr 15 17:37:06 STP: msti 0 set port 1 to discarding
Apr 15 17:37:06 STP: msti 0 set port 2 to discarding
Apr 15 17:37:06 STP: msti 0 set port 3 to discarding
Apr 15 17:37:06 STP: msti 0 set port 4 to discarding
Apr 15 17:37:06 STP: msti 0 set port 6 to discarding
Apr 15 17:37:06 STP: msti 0 set port 7 to discarding
Apr 15 17:37:06 STP: msti 0 set port 8 to discarding
Apr 15 17:37:06 STP: msti 0 set port 9 to discarding
Apr 15 17:37:06 STP: msti 0 set port 10 to discarding
Apr 15 17:37:06 STP: msti 0 set port 11 to discarding
Apr 15 17:37:07 STP: msti 0 set port 13 to discarding
Apr 15 17:37:07 STP: msti 0 set port 11 to learning
Apr 15 17:37:07 STP: msti 0 set port 11 to forwarding
Apr 15 17:37:08 Port: link state changed to 'down' on port 13
Apr 15 17:37:09 STP: msti 0 set port 10 to learning
Apr 15 17:37:09 STP: msti 0 set port 10 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 9 to learning
Apr 15 17:37:09 STP: msti 0 set port 9 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 8 to learning
Apr 15 17:37:09 STP: msti 0 set port 8 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 7 to learning
Apr 15 17:37:09 STP: msti 0 set port 7 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 6 to learning
Apr 15 17:37:09 STP: msti 0 set port 6 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 4 to learning
Apr 15 17:37:09 STP: msti 0 set port 4 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 3 to learning
Apr 15 17:37:09 STP: msti 0 set port 3 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 2 to learning
Apr 15 17:37:09 STP: msti 0 set port 2 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 1 to learning
Apr 15 17:37:09 STP: msti 0 set port 1 to forwarding
Apr 15 17:37:11 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:37:11 STP: msti 0 set port 13 to discarding
Apr 15 17:37:13 STP: msti 0 set port 13 to learning
Apr 15 17:37:14 STP: msti 0 set port 13 to forwarding
Apr 15 17:38:00 monitor: restarting vtss_appl
Apr 15 17:38:01 STP: msti 0 set port 1 to discarding
Apr 15 17:38:01 STP: msti 0 set port 2 to discarding
Apr 15 17:38:01 STP: msti 0 set port 3 to discarding
Apr 15 17:38:01 STP: msti 0 set port 4 to discarding
Apr 15 17:38:01 STP: msti 0 set port 6 to discarding
Apr 15 17:38:01 STP: msti 0 set port 7 to discarding
Apr 15 17:38:01 STP: msti 0 set port 8 to discarding
Apr 15 17:38:02 STP: msti 0 set port 9 to discarding
Apr 15 17:38:02 STP: msti 0 set port 10 to discarding
Apr 15 17:38:02 STP: msti 0 set port 11 to discarding
Apr 15 17:38:02 STP: msti 0 set port 13 to discarding
Apr 15 17:38:02 STP: msti 0 set port 11 to learning
Apr 15 17:38:02 STP: msti 0 set port 11 to forwarding
Apr 15 17:38:12 monitor: restarting vtss_appl
Apr 15 17:38:12 STP: msti 0 set port 1 to discarding
Apr 15 17:38:12 STP: msti 0 set port 2 to discarding
Apr 15 17:38:12 STP: msti 0 set port 3 to discarding
Apr 15 17:38:12 STP: msti 0 set port 4 to discarding
Apr 15 17:38:13 STP: msti 0 set port 6 to discarding
Apr 15 17:38:13 STP: msti 0 set port 7 to discarding
Apr 15 17:38:13 STP: msti 0 set port 8 to discarding
Apr 15 17:38:13 STP: msti 0 set port 9 to discarding
Apr 15 17:38:13 STP: msti 0 set port 10 to discarding
Apr 15 17:38:13 STP: msti 0 set port 11 to discarding
Apr 15 17:38:14 STP: msti 0 set port 13 to discarding
Apr 15 17:38:14 STP: msti 0 set port 11 to learning
Apr 15 17:38:14 STP: msti 0 set port 11 to forwarding
Apr 15 17:38:14 Port: link state changed to 'down' on port 13
Apr 15 17:38:16 STP: msti 0 set port 10 to learning
Apr 15 17:38:16 STP: msti 0 set port 10 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 9 to learning
Apr 15 17:38:16 STP: msti 0 set port 9 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 8 to learning
Apr 15 17:38:16 STP: msti 0 set port 8 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 7 to learning
Apr 15 17:38:16 STP: msti 0 set port 7 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 6 to learning
Apr 15 17:38:16 STP: msti 0 set port 6 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 4 to learning
Apr 15 17:38:16 STP: msti 0 set port 4 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 3 to learning
Apr 15 17:38:16 STP: msti 0 set port 3 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 2 to learning
Apr 15 17:38:16 STP: msti 0 set port 2 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 1 to learning
Apr 15 17:38:16 STP: msti 0 set port 1 to forwarding
Apr 15 17:38:17 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:38:18 STP: msti 0 set port 13 to discarding
Apr 15 17:38:20 STP: msti 0 set port 13 to learning
Apr 15 17:38:20 STP: msti 0 set port 13 to forwarding
Apr 15 17:39:07 monitor: restarting vtss_appl
Apr 15 17:39:08 STP: msti 0 set port 1 to discarding
Apr 15 17:39:08 STP: msti 0 set port 2 to discarding
Apr 15 17:39:08 STP: msti 0 set port 3 to discarding
Apr 15 17:39:08 STP: msti 0 set port 4 to discarding
Apr 15 17:39:08 STP: msti 0 set port 6 to discarding
Apr 15 17:39:08 STP: msti 0 set port 7 to discarding
Apr 15 17:39:08 STP: msti 0 set port 8 to discarding
Apr 15 17:39:08 STP: msti 0 set port 9 to discarding
Apr 15 17:39:08 STP: msti 0 set port 10 to discarding
Apr 15 17:39:08 STP: msti 0 set port 11 to discarding
Apr 15 17:39:09 STP: msti 0 set port 13 to discarding
Apr 15 17:39:09 STP: msti 0 set port 11 to learning
Apr 15 17:39:09 STP: msti 0 set port 11 to forwarding
Apr 15 17:39:10 Port: link state changed to 'down' on port 13
Apr 15 17:39:18 monitor: restarting vtss_appl
Apr 15 17:39:19 STP: msti 0 set port 1 to discarding
Apr 15 17:39:19 STP: msti 0 set port 2 to discarding
Apr 15 17:39:19 STP: msti 0 set port 3 to discarding
Apr 15 17:39:19 STP: msti 0 set port 4 to discarding
Apr 15 17:39:19 STP: msti 0 set port 6 to discarding
Apr 15 17:39:19 STP: msti 0 set port 7 to discarding
Apr 15 17:39:19 STP: msti 0 set port 8 to discarding
Apr 15 17:39:19 STP: msti 0 set port 9 to discarding
Apr 15 17:39:20 STP: msti 0 set port 10 to discarding
Apr 15 17:39:20 STP: msti 0 set port 11 to discarding
Apr 15 17:39:20 STP: msti 0 set port 13 to discarding
Apr 15 17:39:20 STP: msti 0 set port 11 to learning
Apr 15 17:39:20 STP: msti 0 set port 11 to forwarding
Apr 15 17:39:22 Port: link state changed to 'down' on port 13
Apr 15 17:39:22 STP: msti 0 set port 10 to learning
Apr 15 17:39:22 STP: msti 0 set port 10 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 9 to learning
Apr 15 17:39:22 STP: msti 0 set port 9 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 8 to learning
Apr 15 17:39:22 STP: msti 0 set port 8 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 7 to learning
Apr 15 17:39:22 STP: msti 0 set port 7 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 6 to learning
Apr 15 17:39:23 STP: msti 0 set port 6 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 4 to learning
Apr 15 17:39:23 STP: msti 0 set port 4 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 3 to learning
Apr 15 17:39:23 STP: msti 0 set port 3 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 2 to learning
Apr 15 17:39:23 STP: msti 0 set port 2 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 1 to learning
Apr 15 17:39:23 STP: msti 0 set port 1 to forwarding
Apr 15 17:39:25 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:39:25 STP: msti 0 set port 13 to discarding
Apr 15 17:39:27 STP: msti 0 set port 13 to learning
Apr 15 17:39:27 STP: msti 0 set port 13 to forwarding
Apr 15 17:56:53 dropbear[3027]: Exit before auth (user 'admin', 1 fails): Exited normally
Switch #2 stopped with the root bridge notifications as well:
- Code: Select all
Apr 15 17:38:39 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 17:39:34 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 17:39:45 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 19:18:07 UI: Configuration changed by admin
I'm assuming that the first switch's process for spanning tree was already marginal in receiving invalid/unknown BPDUs from the Juniper switch and when the second switch was connected and began transmitting RSTP BPDUs it just blew up and crashed repeatedly. I'm assuming that whatever this thing does controls the SFPs (maybe it's in charge of the I2C bus?) so when it dies, it dumps the SFPs, too.
I did a few searches for vtss_appl and found quite a few varied results. I'm not sure if the fixes in 1.5.2 or 1.5.5 apply to this scenario as the former, while mentioning STP, specifically calls out LAGS, and the latter isn't very specific. I see a lot of people complaining about vtss_appl crashes on 1.5.5, so, if I can mitigate this issue through a different config I don't see any compelling reason to go changing firmware.
If anyone could comment on this issue that would be great. It's truly weird, and, again, the first problem we've ever had with an astounding product.
This is my first post here so apologies if the inline logs are not convention.
Thanks!