DC switches cold-reboot upon sudden voltage drops ?
Posted: Tue Feb 06, 2018 6:20 pm
We're having repeated cases of DC-switches that are rebooting upon an AC power surge or outage.
To clarify, all switches are directly connected to the battery bank (with only a fuse in between), and the AC chargers are also connected directly to the batteries (with a fuse in between).
Occasionally, upon AC-grid failure, or a surge, the switch reboots (cold-reboot, all POE devices go down with it).
It comes back seconds later (the time it takes to boot), and continue to work just fine on the same batteries, even without the AC restored to the charger.
The batteries are good, and would support the site for hours (or days), without AC to the charger.
Obviously, there would be a drop in voltage when the charger stops charging, but that should not affect the Netonix switches - this defeats the key objective of using these switches in the first place!
I initially thought that perhaps the "Hibernation" function kicks-in too quickly, but this happens on both WS-X-250-DC models (with the smart DC-DC converter that supports hibernation), and with WS-X-150-DC models (without the DC-DC converter - no hibernation).
This does *not* happen all the time, and I cannot find a pattern for when it does happen.
Today, we had an AC-grid outage that affected about 40 sites, with 40 Netonix switches.
Only two rebooted, the other sites continued fine... The two that rebooted, came back up seconds later (the radios about a minute later), and continued to work, on batteries, without AC, for hours until AC was restored.
Some more information that might be relevant:
This happens to both 48V sites, as well as 24V sites.
All 24V sites use the same charger, and all 48V sites use the same charger.
Sites use different sizes of battery banks... From 4x200AH (AGM) at large sites, to 2x50AH (AGM) at smaller sites.
I can confirm, with certainty, that two years ago, this would not happen...!!!
I used to visit every site and yank the AC power as a test, without any harm (I loved getting the notification about the AC-power failure {voltage drop} on my phone, via our NMS as well as via direct email from the Netonix switch on site).
About a 18 months ago, when the said problem stated to happen routinely, I stopped this practice of mine as it was too harmful for the network. I reported to Netonix, and some firmware upgrades were released, but we never really tested properly, and I can confirm that it's still not fixed.
I don't know if the problem started with a firmware upgrade, or perhaps with newer hardware board revisions -- we did swapped and replaced quite a few switches over the years, and no tracking of what hardware version is where...
Obviously, in sites where the batteries have not been replaces, we're looking at older batteries --- but they do hold for hours, it's just the initial instant of the AC drop that is causing this. Hence, we don't replace the batteries that are still good.
All switches are running new-ish firmwares.
Oldest is 1.4.8rc10, Newest is 1.4.9, with the vast majority of switches at 1.4.8
We have many solar-powered sites, where this problem does not manifest (obviously), and these have ~150 days of uptime (since the previous firmware upgrade).
Sadly, most AC-powered sites (AC charger which chargers the batteries), do not enjoy such long uptimes.
* Unrelated, but there's a bug in the Netonix manager which sorts the "uptime" column based on text and not a numeric value - making this kind of sorting difficult.
I am fairly certain we did not had any such issues when we started with Netonix about 3 years ago.
This issue was "introduced" quite a while afterwards, and have been hurting us ever since.
Suggestions?
Anyone else seeing this?
Thanks!
Yahel.
Yahel Ben-David, Ph.D.
xecutive Director
[color=#45818e]Bridging the gap between research and impact.[/color]
To clarify, all switches are directly connected to the battery bank (with only a fuse in between), and the AC chargers are also connected directly to the batteries (with a fuse in between).
Occasionally, upon AC-grid failure, or a surge, the switch reboots (cold-reboot, all POE devices go down with it).
It comes back seconds later (the time it takes to boot), and continue to work just fine on the same batteries, even without the AC restored to the charger.
The batteries are good, and would support the site for hours (or days), without AC to the charger.
Obviously, there would be a drop in voltage when the charger stops charging, but that should not affect the Netonix switches - this defeats the key objective of using these switches in the first place!
I initially thought that perhaps the "Hibernation" function kicks-in too quickly, but this happens on both WS-X-250-DC models (with the smart DC-DC converter that supports hibernation), and with WS-X-150-DC models (without the DC-DC converter - no hibernation).
This does *not* happen all the time, and I cannot find a pattern for when it does happen.
Today, we had an AC-grid outage that affected about 40 sites, with 40 Netonix switches.
Only two rebooted, the other sites continued fine... The two that rebooted, came back up seconds later (the radios about a minute later), and continued to work, on batteries, without AC, for hours until AC was restored.
Some more information that might be relevant:
This happens to both 48V sites, as well as 24V sites.
All 24V sites use the same charger, and all 48V sites use the same charger.
Sites use different sizes of battery banks... From 4x200AH (AGM) at large sites, to 2x50AH (AGM) at smaller sites.
I can confirm, with certainty, that two years ago, this would not happen...!!!
I used to visit every site and yank the AC power as a test, without any harm (I loved getting the notification about the AC-power failure {voltage drop} on my phone, via our NMS as well as via direct email from the Netonix switch on site).
About a 18 months ago, when the said problem stated to happen routinely, I stopped this practice of mine as it was too harmful for the network. I reported to Netonix, and some firmware upgrades were released, but we never really tested properly, and I can confirm that it's still not fixed.
I don't know if the problem started with a firmware upgrade, or perhaps with newer hardware board revisions -- we did swapped and replaced quite a few switches over the years, and no tracking of what hardware version is where...
Obviously, in sites where the batteries have not been replaces, we're looking at older batteries --- but they do hold for hours, it's just the initial instant of the AC drop that is causing this. Hence, we don't replace the batteries that are still good.
All switches are running new-ish firmwares.
Oldest is 1.4.8rc10, Newest is 1.4.9, with the vast majority of switches at 1.4.8
We have many solar-powered sites, where this problem does not manifest (obviously), and these have ~150 days of uptime (since the previous firmware upgrade).
Sadly, most AC-powered sites (AC charger which chargers the batteries), do not enjoy such long uptimes.
* Unrelated, but there's a bug in the Netonix manager which sorts the "uptime" column based on text and not a numeric value - making this kind of sorting difficult.
I am fairly certain we did not had any such issues when we started with Netonix about 3 years ago.
This issue was "introduced" quite a while afterwards, and have been hurting us ever since.
Suggestions?
Anyone else seeing this?
Thanks!
Yahel.
Yahel Ben-David, Ph.D.
xecutive Director
[color=#45818e]Bridging the gap between research and impact.[/color]