Graphing and stats issues
Posted: Tue Jan 29, 2019 1:25 am
I've never noticed this before, at least never been sure there was a discrepancy in graphing. But tonight that was no question about it. On the 30 second graph I was showing a 1Gb interface peaking clear up to 1.5Gb, although I didn't think to capture a screen pic of it while I had a 30sec graph. Literally looking at mealtime numbers and on the router I was showing the interface barely peaking over 400Mb but sure enough the switch reporting about 3X that. While reviewing graphs it seemed pretty clear that this was occurring on all ports on the switch.
It also appears as if the port details were affected too. Pause frames seem to be reported by the switch at about 3X what the router interfaces show and on some APs I'm showing Billions of RX pause frames (the switch was rebooted 10 days ago).
While this was occuring that switch status page did show sustained CPU usage quite high (upper 90%s) and then I noticed that the switch showed as disconnected in the netonix manager. I edited the switch in manager (not making changes but to force reconnect) and a few seconds later the switch showed back online. I switched back to the tab with the switch UI and suddenly traffic had returned to normal. Rechecked CPU usage and it was hovering 60-70%.
I thought we had some SNMP monitors on some of the ports, but upon review we didn't as I was curious to know if the incorrect data was also reported to SNMP. In review of the switch log the only note worthy entries were that 6 hours prior we had warning about excessive pause frames from an AP and just about the time the problem resolved the same occurred on another port. Historically these notices are rarely seen on this switch, which makes me wondering if incorrect status were driving the messages.
All of this was very strange and with unreliable numbers it's a tough one. The log is below and I've attached several screen pics as well.
Presently running 1.5.0 and I'm thinking of upgrading to 1.5.1RC
It also appears as if the port details were affected too. Pause frames seem to be reported by the switch at about 3X what the router interfaces show and on some APs I'm showing Billions of RX pause frames (the switch was rebooted 10 days ago).
While this was occuring that switch status page did show sustained CPU usage quite high (upper 90%s) and then I noticed that the switch showed as disconnected in the netonix manager. I edited the switch in manager (not making changes but to force reconnect) and a few seconds later the switch showed back online. I switched back to the tab with the switch UI and suddenly traffic had returned to normal. Rechecked CPU usage and it was hovering 60-70%.
I thought we had some SNMP monitors on some of the ports, but upon review we didn't as I was curious to know if the incorrect data was also reported to SNMP. In review of the switch log the only note worthy entries were that 6 hours prior we had warning about excessive pause frames from an AP and just about the time the problem resolved the same occurred on another port. Historically these notices are rarely seen on this switch, which makes me wondering if incorrect status were driving the messages.
All of this was very strange and with unreliable numbers it's a tough one. The log is below and I've attached several screen pics as well.
Presently running 1.5.0 and I'm thinking of upgrading to 1.5.1RC
- Code: Select all
Jan 28 15:25:52 switch[1055]: got excessive pause frames on port 9 (16009), count = 1
Jan 28 15:25:54 switch[1055]: got excessive pause frames on port 9 (18419), count = 2
Jan 28 15:25:56 switch[1055]: got excessive pause frames on port 9 (16037), count = 3
Jan 28 15:26:00 switch[1055]: got excessive pause frames on port 9 (19743), count = 1
Jan 28 16:02:46 Port: link state changed to 'up' (1G) on port 7 (tech laptop on site)
Jan 28 16:03:07 Port: link state changed to 'down' on port 7 (tech laptop on site)
Jan 28 16:03:27 Port: link state changed to 'up' (100M-F) on port 7 (tech laptop on site)
Jan 28 16:03:43 Port: link state changed to 'down' on port 7 (tech laptop on site)
Jan 28 16:03:46 Port: link state changed to 'up' (1G) on port 7 (tech laptop on site)
Jan 28 16:08:07 Port: link state changed to 'down' on port 7 (tech laptop on site)
Jan 28 16:08:28 Port: link state changed to 'up' (1G) on port 7 (tech laptop on site)
Jan 28 16:23:06 Port: link state changed to 'down' on port 7 (tech laptop on site)
Jan 28 20:52:09 switch[1055]: got excessive pause frames on port 11 (15387), count = 1
Jan 28 20:56:57 switch[1055]: got excessive pause frames on port 11 (16803), count = 1