Overheating switch
Posted: Wed Sep 09, 2015 10:58 pm
Tried to PM this to Chris but the message seems stuck in my outbox so here is hoping some of you may have some ideas or he is up reading late.....
Long story short and for us, a mini-emergency, is that the WISP Switch (WS-24-400A) at our main distribution point is acting up in a major way tonight. It had been chugging along nicely running 1.1.8 and since that was kind of the big release where a bunch of bugs had been fixed, we had felt OK leaving it there. About four days ago we made a change where we added another AirFiber, making the total number powered by the switch four and putting the total power usage on the switch at about 290 watts. Again, it has run OK until tonight when I started getting pages that things had gone down. In logging into the switch, only port 4 of the high powered ports actually had the right amount of power flowing through (about 49 watts) while the other three ports indicated that 48VH was enabled, but no link light and only 1 or 2 watts on the ports.
I restarted the switch and then had an issue I have had previously where it would not come back up. I was doing all of this on the road through a remote desktop session through my iPhone phone and tried to grab screenshots along the way so I can show you what I encountered. Fortunately after losing access to the remote session, I have an LTE backup to a VM running at the site and was able to connect to that and then through an industrial PC at the site, connect to the WISP switch over Putty through the serial port. This is what has happened on at least three occasions before across various versions of firmware... all I have to do is hit enter a couple of times and the switch finishes booting. However, after it restarted and I could access the GUI again, the interface still showed the same status, where only port 4 of the high powered ports, was active (all of the other ports 4-24 seemed to be fine by the way and continued to pass traffic). I then turned off the power to ports 1-3 and then let the switch sit for about 30 seconds. Then I turned each one on, one by one, waiting between each to see if the power would ramp up. It did and I saw the progression where the reported consumption went to around 24W and then to 48-50W as the airFibers continued booting. I was able to bring up all three ports again and finally exhaled. Then after about ten minutes of uptime the situation repeated itself.
Looking at the status tab the temperatures seemed pretty hot. All three fans are pegged at around 10K RPM. Board temp is 61C/142F, CPU Temp is 96C/205F, PHY Temp for ports 1-12 is 96C/205F and PHY Temp for 13-24 is 61C/142F. Got desperate and powered down the three ports again and watched the temperature fall on the CPU and PHY Temps to 165F fairly quickly. Upgraded the firmware to 1.3.2 hoping that would be the fix. CPU usage is at 49%. Just powered up 1,2, and 4 and things have been up solid for about 30 minutes. Am about to enable port 3 again now, but even with 1,2, and 4 powered up only, the temperature on the CPU and PHY seems stuck at 96C/205F. Only change I can think of is the addition of the new AirFiber though as I said earlier the total power usage with all four active (and whatever powered ports are on 5-24) is around 280-290. With just the three (and whatever powered ports on 5-24) it is at 240 Watts, so I don't think we are pushing the switch too hard. Peak temperature here today was around 103 if that helps unravel the mystery at all.
Thanks in advance.
Long story short and for us, a mini-emergency, is that the WISP Switch (WS-24-400A) at our main distribution point is acting up in a major way tonight. It had been chugging along nicely running 1.1.8 and since that was kind of the big release where a bunch of bugs had been fixed, we had felt OK leaving it there. About four days ago we made a change where we added another AirFiber, making the total number powered by the switch four and putting the total power usage on the switch at about 290 watts. Again, it has run OK until tonight when I started getting pages that things had gone down. In logging into the switch, only port 4 of the high powered ports actually had the right amount of power flowing through (about 49 watts) while the other three ports indicated that 48VH was enabled, but no link light and only 1 or 2 watts on the ports.
I restarted the switch and then had an issue I have had previously where it would not come back up. I was doing all of this on the road through a remote desktop session through my iPhone phone and tried to grab screenshots along the way so I can show you what I encountered. Fortunately after losing access to the remote session, I have an LTE backup to a VM running at the site and was able to connect to that and then through an industrial PC at the site, connect to the WISP switch over Putty through the serial port. This is what has happened on at least three occasions before across various versions of firmware... all I have to do is hit enter a couple of times and the switch finishes booting. However, after it restarted and I could access the GUI again, the interface still showed the same status, where only port 4 of the high powered ports, was active (all of the other ports 4-24 seemed to be fine by the way and continued to pass traffic). I then turned off the power to ports 1-3 and then let the switch sit for about 30 seconds. Then I turned each one on, one by one, waiting between each to see if the power would ramp up. It did and I saw the progression where the reported consumption went to around 24W and then to 48-50W as the airFibers continued booting. I was able to bring up all three ports again and finally exhaled. Then after about ten minutes of uptime the situation repeated itself.
Looking at the status tab the temperatures seemed pretty hot. All three fans are pegged at around 10K RPM. Board temp is 61C/142F, CPU Temp is 96C/205F, PHY Temp for ports 1-12 is 96C/205F and PHY Temp for 13-24 is 61C/142F. Got desperate and powered down the three ports again and watched the temperature fall on the CPU and PHY Temps to 165F fairly quickly. Upgraded the firmware to 1.3.2 hoping that would be the fix. CPU usage is at 49%. Just powered up 1,2, and 4 and things have been up solid for about 30 minutes. Am about to enable port 3 again now, but even with 1,2, and 4 powered up only, the temperature on the CPU and PHY seems stuck at 96C/205F. Only change I can think of is the addition of the new AirFiber though as I said earlier the total power usage with all four active (and whatever powered ports are on 5-24) is around 280-290. With just the three (and whatever powered ports on 5-24) it is at 240 Watts, so I don't think we are pushing the switch too hard. Peak temperature here today was around 103 if that helps unravel the mystery at all.
Thanks in advance.