We have been getting notifications from our NAGIOS monitoring regarding the SNMP plug in timing out when executing system calls. This started maybe a few days ago but has been going on since and we've noticed the UI slowing down. Last night we swapped our older switch with a new RMA'd switch that had a hardware replacement done. It came with 2.0.6rc3 instead of 2.0.4 like our current switch had so I updated it to 2.0.6rc4 to have the latest and then copied the conf from our current switch to the new one. Everything looked fine and after installing it and putting into place, we started seeing the new switch being very slow. The UI would take very long to respond or update data if it even did. We weren't sure if this was a hardware or firmware issue so we replaced the switch back with the original running 2.0.4 switch and after maybe 30 minutes we began getting the same SNMP notifications and the UI is crawling if doing anything at all. From what we see, traffic is still being passed but we aren't sure if this is a sign of the switches heading towards a crash.
Both switches are WS3-14-600-AC. One on 2.0.4 (current switch) and one on 2.0.6rc4 (RMAd switch with hardware replacement).
We noticed that the switch (when it does update UI information) shows CPU usage at 98%.
Below is some information that will hopefully help.
# uptime
14:20:32 up 2:02, load average: 55.73, 56.16, 55.73
#
# top
Mem: 143632K used, 109556K free, 0K shrd, 25920K buff, 42336K cached
CPU: 43% usr 55% sys 0% nic 0% idle 0% io 0% irq 0% sirq
Load average: 57.02 56.80 55.65 58/144 4121
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
45 1 admin S 478m 193% 12% /usr/bin/switch_app
453 1 admin R 2448 1% 2% netonix_app -c
504 1 admin R 2448 1% 2% netonix_app -c
983 1 admin R 2448 1% 2% netonix_app -c
1823 1 admin R 2448 1% 2% netonix_app -c
2166 1 admin R 2448 1% 2% netonix_app -c
2955 1 admin R 2448 1% 2% netonix_app -c
WISP WS3-14-600-AC bogged down
-
Stephen - Employee
- Posts: 1033
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 85 times
- Been thanked: 181 times
Re: WISP WS3-14-600-AC bogged down
Noted, there are a few things on the list for the next full release. I'll add this in for evaluation as well.
- OacysShop
- Member
- Posts: 29
- Joined: Thu Dec 28, 2017 7:10 pm
- Has thanked: 3 times
- Been thanked: 0 time
Re: WISP WS3-14-600-AC bogged down
After further testing, we factory reset our RMA'd switch and got the same results. The UI was extremely slow if at all responding.
Load average: 0.47 0.51 0.26 1/95 427
We tested further and rebooted the switch to the same results. Finally we rebooted the switch but this time did NOT open up the Web UI and load averages stayed low:
Load average: 0.67 0.60 0.33 2/95 507
We also only showed two instances of netonix_app -c instead of MANY from the log example above.
Upon further testing. We opened the Web UI with firefox and it was very responsive and TOP had good load averaged and low CPU usage. We then opened again in Chrome (which is what we've always been using) and it is now being very responsive and we don't see TOP have high load averages or high CPU use.
Testing again this morning and load averages are about
Load average: 2.14 2.01 1.82 2/94 7129
Web UI is still responsive in Chrome but locks up every now and then but eventually continues to show data as expected.
After closing Chrome and testing in Firefox the same results happened as in chrome. This time though, only one instance of netonix_app
Please let me know if there is any other information you may need that will help.
Load average: 0.47 0.51 0.26 1/95 427
We tested further and rebooted the switch to the same results. Finally we rebooted the switch but this time did NOT open up the Web UI and load averages stayed low:
Load average: 0.67 0.60 0.33 2/95 507
We also only showed two instances of netonix_app -c instead of MANY from the log example above.
Upon further testing. We opened the Web UI with firefox and it was very responsive and TOP had good load averaged and low CPU usage. We then opened again in Chrome (which is what we've always been using) and it is now being very responsive and we don't see TOP have high load averages or high CPU use.
Testing again this morning and load averages are about
Load average: 2.14 2.01 1.82 2/94 7129
Web UI is still responsive in Chrome but locks up every now and then but eventually continues to show data as expected.
After closing Chrome and testing in Firefox the same results happened as in chrome. This time though, only one instance of netonix_app
Please let me know if there is any other information you may need that will help.
-
Stephen - Employee
- Posts: 1033
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 85 times
- Been thanked: 181 times
Re: WISP WS3-14-600-AC bogged down
I actually already have an idea of what might have happened but I just for process of elimination do you have an instance of Netonix Manager running on your network?
- OacysShop
- Member
- Posts: 29
- Joined: Thu Dec 28, 2017 7:10 pm
- Has thanked: 3 times
- Been thanked: 0 time
Re: WISP WS3-14-600-AC bogged down
We do and I thought that may have been an issue as well. I have since removed the switches that were on it (it was just these two) yesterday but the issue had remained until late yesterday. Early this morning the random lock ups were happening on the switch currently handling traffic while our back up (RMA) switch is working as expected.
-
Stephen - Employee
- Posts: 1033
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 85 times
- Been thanked: 181 times
Re: WISP WS3-14-600-AC bogged down
If the manager was causing extraneous process's to be launched and remain open turning the manager off would prevent it from getting worse but the switch would need to be rebooted to clear the orphaned process's that are eating memory/CPU power.
If we can confirm that's the source of what you're seeing, that will definitely go a long way to getting this bug fixed sooner than later.
If we can confirm that's the source of what you're seeing, that will definitely go a long way to getting this bug fixed sooner than later.
- OacysShop
- Member
- Posts: 29
- Joined: Thu Dec 28, 2017 7:10 pm
- Has thanked: 3 times
- Been thanked: 0 time
Re: WISP WS3-14-600-AC bogged down
So far since removing the switches from the manager this morning we haven't had them get completely bogged down (just random lock ups which come back up after a bit) which is good. Just curious if the manager would cause all these process to be launched at the same time? We checked them out and the biggest amount of them look to have all started at the same time, roughly 2 hours or so after the switch was rebooted:
# time uptime
17:20:35 up 5:02, load average: 57.93, 57.68, 57.48
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2030
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/205
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/206
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/207
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/208
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/209
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2097
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2166
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2233
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2319
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2390
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2462
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2472
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2535
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2599
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2669
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2739
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2811
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2883
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2955
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3027
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3100
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3182
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3251
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3355
After manually killing them off (last night before removing the switches from the manager) the switch was much more responsive.
# time uptime
17:20:35 up 5:02, load average: 57.93, 57.68, 57.48
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2030
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/205
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/206
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/207
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/208
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/209
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2097
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2166
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2233
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2319
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2390
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2462
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2472
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2535
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2599
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2669
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2739
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2811
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2883
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/2955
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3027
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3100
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3182
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3251
dr-xr-xr-x 8 admin root 0 Jun 8 14:13 /proc/3355
After manually killing them off (last night before removing the switches from the manager) the switch was much more responsive.
-
Stephen - Employee
- Posts: 1033
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 85 times
- Been thanked: 181 times
Re: WISP WS3-14-600-AC bogged down
I wouldn't think it happens all at once. The manager regularly pokes at switches to get their status if they're being monitored, each one of those requests could potentially cause trouble if not handled correctly.
8 posts
Page 1 of 1
Who is online
Users browsing this forum: No registered users and 41 guests