Latency increases dramatically, fixed by reboot

DOWNLOAD THE LATEST FIRMWARE HERE
flameproof
Member
 
Posts: 22
Joined: Thu Sep 24, 2015 6:04 am
Has thanked: 0 time
Been thanked: 1 time

Latency increases dramatically, fixed by reboot

Sat Apr 23, 2016 3:48 pm

I am noticing a problem that seems to happen at random on our switches. On occasions, they are reported as "down" by our network monitoring system, but in fact, the switch is not really down, just that latencies have increased significantly. Here is a continuous-running ping, with a reboot. You can see the pings go from random and high values, to sub-5ms which is what we see in our entire network when operating correctly:


Code: Select all
 64 bytes from 172.16.255.151: icmp_seq=14 ttl=63 time=26.1 ms
64 bytes from 172.16.255.151: icmp_seq=15 ttl=63 time=139 ms
64 bytes from 172.16.255.151: icmp_seq=16 ttl=63 time=854 ms
64 bytes from 172.16.255.151: icmp_seq=17 ttl=63 time=853 ms
64 bytes from 172.16.255.151: icmp_seq=18 ttl=63 time=1003 ms
64 bytes from 172.16.255.151: icmp_seq=19 ttl=63 time=575 ms
64 bytes from 172.16.255.151: icmp_seq=20 ttl=63 time=343 ms
64 bytes from 172.16.255.151: icmp_seq=21 ttl=63 time=5.80 ms
64 bytes from 172.16.255.151: icmp_seq=22 ttl=63 time=23.6 ms
64 bytes from 172.16.255.151: icmp_seq=23 ttl=63 time=630 ms
64 bytes from 172.16.255.151: icmp_seq=24 ttl=63 time=590 ms
64 bytes from 172.16.255.151: icmp_seq=25 ttl=63 time=373 ms
64 bytes from 172.16.255.151: icmp_seq=26 ttl=63 time=193 ms
64 bytes from 172.16.255.151: icmp_seq=27 ttl=63 time=809 ms
64 bytes from 172.16.255.151: icmp_seq=28 ttl=63 time=309 ms
64 bytes from 172.16.255.151: icmp_seq=29 ttl=63 time=3.85 ms
64 bytes from 172.16.255.151: icmp_seq=30 ttl=63 time=523 ms
64 bytes from 172.16.255.151: icmp_seq=31 ttl=63 time=718 ms
64 bytes from 172.16.255.151: icmp_seq=32 ttl=63 time=12.2 ms
64 bytes from 172.16.255.151: icmp_seq=33 ttl=63 time=703 ms
64 bytes from 172.16.255.151: icmp_seq=34 ttl=63 time=16.7 ms
64 bytes from 172.16.255.151: icmp_seq=60 ttl=63 time=8.64 ms
64 bytes from 172.16.255.151: icmp_seq=61 ttl=63 time=3.91 ms <- AFTER REBOOT
64 bytes from 172.16.255.151: icmp_seq=62 ttl=63 time=2.66 ms
64 bytes from 172.16.255.151: icmp_seq=63 ttl=63 time=2.66 ms
64 bytes from 172.16.255.151: icmp_seq=64 ttl=63 time=4.26 ms
64 bytes from 172.16.255.151: icmp_seq=65 ttl=63 time=3.50 ms
64 bytes from 172.16.255.151: icmp_seq=66 ttl=63 time=3.20 ms
64 bytes from 172.16.255.151: icmp_seq=67 ttl=63 time=3.84 ms
64 bytes from 172.16.255.151: icmp_seq=68 ttl=63 time=3.88 ms
64 bytes from 172.16.255.151: icmp_seq=69 ttl=63 time=3.86 ms
64 bytes from 172.16.255.151: icmp_seq=70 ttl=63 time=2.22 ms
64 bytes from 172.16.255.151: icmp_seq=71 ttl=63 time=3.36 ms
64 bytes from 172.16.255.151: icmp_seq=72 ttl=63 time=2.99 ms
64 bytes from 172.16.255.151: icmp_seq=73 ttl=63 time=2.82 ms


Any ideas?

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Latency increases dramatically, fixed by reboot

Sat Apr 23, 2016 4:10 pm

It would help to know your model and firmware version but even so there is no known issue like that.

More than likely I would assume there is traffic on your switch somewhere and when you reboot the switch you break the traffic/stream and when the switch comes back up the traffic has stopped.

Maybe investigate each port and track down the offending traffic. You could disable the ports one at a time.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
lligetfa
Associate
Associate
 
Posts: 1191
Joined: Sun Aug 03, 2014 12:12 pm
Location: Fort Frances Ont. Canada
Has thanked: 307 times
Been thanked: 381 times

Re: Latency increases dramatically, fixed by reboot

Sat Apr 23, 2016 4:37 pm

I assume you are pinging the switch itself? If so, it is not a good measure of network latency as the switch probably puts a very low priority on answering pings. I had a bunch of HP Procurve 2524 switches that got real lazy answering pings if the switch CPU went over 25%. I stopped using them in high traffic areas cuz my NMS would false alert on them.

Are you taxing the switch CPU with SNMP? What do you get if you ping a device beyond the switch, preferably a device that doesn't get crazy busy?

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Latency increases dramatically, fixed by reboot

Sat Apr 23, 2016 4:47 pm

I am guessing there is some sort of traffic on his network then when he reboots the switch the stream stops and all is good again?

But still would help to know model and firmware.

I mean all the switches models are the exact same switch core and cpu with just more or less ports but still nice to know when someone reports an issue.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

flameproof
Member
 
Posts: 22
Joined: Thu Sep 24, 2015 6:04 am
Has thanked: 0 time
Been thanked: 1 time

Re: Latency increases dramatically, fixed by reboot

Mon Apr 25, 2016 2:29 pm

So this happens on either WS-12-250A (we have 5, and I've seen it happen on all of them), or our WS-24-400B. They all run FW 1.3.9.

As for traffic, our system is not live yet, so we have very little traffic, peaks are 1Mbps. When I see this happening, the traffic levels on the switch are normal, no peaks or high sustained rates.

I have SNMP enabled, but since I consider all SNMP monitoring platforms to be bloated, inefficient, or too aggressive on hardware and network resources, I built my own, which does a single ping per minute, and alerts if the average latency over X minutes goes over Y much. It also connects via SSH/HTTPS and grabs JSON-format info about status, once every 10 minutes. See screenshot of my "dashboard"... (all PHP/HTML/JS based, road names removed for privacy)

Screenshot at Apr 25 20-23-13.png

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Latency increases dramatically, fixed by reboot

Mon Apr 25, 2016 2:32 pm

I hate to say this but there has to be something going on with your network.

There are over 12,000 switches in service and a bug like this would surely be upsetting a LOT of people.

Plus I have 25+ of these switches in service at my WISP and do not see this issue.

You could start by posting up all of your Config Tabs and explaining your network configuration.

Then post up your Switch Log and Device/Status Tab from just before you issue the reboot.

READ NEXT POST
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Latency increases dramatically, fixed by reboot

Mon Apr 25, 2016 2:33 pm

If you have a large Flat Network you should upgrade to v1.4.0rc12 or disable UBNT Discovery as there was an issue found with that which was fixed in v1.4.0rcX

I would suggest upgrading to v1.4.0rc12 and see what happens.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

flameproof
Member
 
Posts: 22
Joined: Thu Sep 24, 2015 6:04 am
Has thanked: 0 time
Been thanked: 1 time

Re: Latency increases dramatically, fixed by reboot

Mon Apr 25, 2016 2:40 pm

OK, will do. The config is a star topology, of one central site (WS-24) connected to five sub-nodes, each with a WS-12. There are 5 Ubiquiti NanoBeam ac on the WS-24, linking to the matching NanoBeam ac on each of the five WS-12. On the WS-24 an AirFiber connects back to the fiber local loop.

The NanoBeam ac ports on the WS-24 are on one VLAN, with ports not isolated.

On each WS-12, there are a number (between 3 and 8) of NanoStation M5, which provide backhaul to the access points. The access points have dual 5GHz/2.4GHz radios. The 5GHz side connects back to the nearest M5, and 2.4GHz provides user device access. On the WS-12, the M5 ports are on the same VLAN, but isolated from each other.

I don't have storm control enabled on any of the switches, but loop protection is enabled. Discovery is disabled on all switches.

Next time this happens I'll share status & config details of the affected switch.

After reading suggestion: although Ubnt discovery is disabled, I'll update to 1.4.0rc12 and see what happens.

flameproof
Member
 
Posts: 22
Joined: Thu Sep 24, 2015 6:04 am
Has thanked: 0 time
Been thanked: 1 time

Re: Latency increases dramatically, fixed by reboot

Thu Apr 28, 2016 5:47 pm

So, I'm having this issue on a WS-12-250A right now. I have taken screenshots of all relevant tabs, and even the top command run over SSH. As you can see, there is almost NO traffic through the switch.

In addition, ping times to the NanoBeam feeding into the switch are the normal 2.5ms average, pings to anything that is across the switch, on the other side so to speak, are just as bad.

I've read some comments from people having nasty issues with the suggested rc firmware, is it safe to use in a production environment?

Screenshot at Apr 28 23-35-09.png


Screenshot at Apr 28 23-34-43.png


Screenshot at Apr 28 23-39-11.png


Screenshot at Apr 28 23-41-18.png

User avatar
sirhc
Employee
Employee
 
Posts: 7416
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: Latency increases dramatically, fixed by reboot

Thu Apr 28, 2016 6:03 pm

I am running v1.4.0rc12 in production.

There is a small memory leak in rc12 with Discovery but it is small and would take many many days to cause an issues which would eventually result in the switch rebooting.

Memory leak discussed in this thread: viewtopic.php?f=17&t=1672

You can simply not enable Discovery on the Device/Configuration Tab but it would take many days for the memory leak to cause a problem and we hope to have the next rc version released by Monday.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

Next
Return to Hardware and software issues

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot] and 76 guests