v1.5.17rcX Bug Reports and Comments

Wed Aug 21, 2024 5:54 pm

Stephen wrote:

Hello RTGLW,

Couple questions to see if we can figure out what's going on.

For the switch's that had discovery disabled and are still growing in memory, if you reboot them - does the memory growth stop?

For the same set of switch's that continued increasing in memory usage after Discovery was disabled, is there anything else different you can tell us about their configuration? Even if it seems small, such as, where SFP ports plugged in, where service's or ports enabled/disabled that differ from the other switch's? etc.

There were memory leaks found in SNMP and in the Watchdog functionality that where patched in 1.5.16. However, since the switch has so many different possible configurations, differing config's could expose others that haven't otherwise been caught, or wouldn't show up normally from an average deployment.

That being said, since you mentioned you saw that vtss_appl had a large memory consumption in a few instance's. For these switch's, did they have any of the following attributes?
- SFP ports plugged in? - if so, do you see any I2C error's in the switch log on these units?
- LAGs or LACP configured? - Is there anything in the logs related to these services?

For the switch's that continued showing memory loss but the offending process was not vtss_appl. Can you tell us which process it was? ps aux is OK for this type of test, on a production unit, this is the best that can be done. But knowing which process it is will definitely help find the issue.

Just in case it's still SNMP that's causing problems. If you have a linux box laying around anywhere. You can try running this in a loop to try and expose the problem more blatantly.

1. Install netsnmp, on Ubuntu, you can use snap to install it: https://snapcraft.io/install/net-snmp/ubuntu
- if on another distribution, I leave the details to you.
2. In a bash terminal (after net-snmp is installed) run the following:
Code: Select all
while true; do snmpwalk -v 2c -c public ; done

This will continuously query the entire snmp tree. I have tools that do similar things to try and expose issue's. You can launch this in a few terminals to try to increase the effect.
If SNMP is the cause, the memory loss should increase dramatically after running this script(s). If you notice this, please let us know here and any other details such as the switch configuration so we can try and replicate it. (I suppose this is obvious but I'll say it anyway, please don't do this on switch's in production)

Memory does go up over time during normal operation, but it should eventually level off and also clear itself, as there is caching that occurs in the kernel that is slightly different than an average one that is meant to keep the interface between frames going between the CPU and the switchcore moving smoothly. High traffic loads can trigger this, but as traffic fluctuates up and down, it should clean it up as it goes along. Depending on the load, that might be all it is.

On a similar vein, if enabled, try disabling pause frames to see if that helps.

Hey Stephen, appreciate all the information provided on this. To answer some of your immediate questions:

For the switch's that had discovery disabled and are still growing in memory, if you reboot them - does the memory growth stop?
To keep things mostly relevant to the newest FW release; I've rebooted our 1.5.17rc2 host and will monitor over the coming week(s) to see if memory growth stops, as discovery was only disabled after the FW upgrade & reboot. BUT, I can confirm that a 1.5.15rc3 host had to be rebooted after we disabled Discovery due to it's memory being so low, and that host has NOT had it's memory increase since with 43 days uptime.

For the same set of switch's that continued increasing in memory usage after Discovery was disabled, is there anything else different you can tell us about their configuration? Even if it seems small, such as, where SFP ports plugged in, where service's or ports enabled/disabled that differ from the other switch's? etc.
These switches all have the same identical configuration (we provision them with a script) including which ports are in-use other than some have unused ports disabled while others do not and their SNMP Server location strings are different. No SFP, LACP, or LAGs in use on any of these.

That being said, since you mentioned you saw that vtss_appl had a large memory consumption in a few instance's. For these switch's, did they have any of the following attributes? SFP ports plugged in? LAGs or LACP configured? Is there anything in the logs related to these services?
No SFP ports in use on those hosts, no LAGs or LACP configured either. The only events in their logs are DHCP lease renewals.

For the switch's that continued showing memory loss but the offending process was not vtss_appl. Can you tell us which process it was? ps aux is OK for this type of test, on a production unit, this is the best that can be done. But knowing which process it is will definitely help find the issue.
Units still showing memory loss don't seem to be critical enough yet to show any other process that looks like a blatant outlier. I'll continue to monitor to see if I observe abnormal growth on any to report back.

Just in case it's still SNMP that's causing problems. If you have a linux box laying around anywhere. You can try running this in a loop to try and expose the problem more blatantly.
I could set up our lab for this after we get some results on the 1.5.17rc2 host as mentioned further up. Though I'd want to note that with the fixes introduced in 1.5.15~16 for SNMP, the overwhelming majority of our hosts on that FW (which have not had Discovery enabled since their last reboot) have no longer shown SNMP related memory leak issues. (Another huge thanks for that one btw.)

Thu Aug 22, 2024 2:02 am

Hi RTGLW,

Sounds good, will keep an eye out as the situation develops.

One more thought. also watch the log file: /var/log/messages. it will also consume memory over time. But it should clear itself after awhile to prevent issue's.

Mon Aug 26, 2024 1:01 pm

Still getting hacked ?? I think? at midnight all my hacked switched that are running 1.5.17rc2 are maxed out cpu ? This is the there'd go around trouble shooting this hack. I need help. I have lots of time any money in to this fix that is not working.

Log in switch

Mon Aug 26, 2024 2:09 pm

Hello SDWISP, can you send me a backup of your config for this switch?

EDIT:
Looking back over post's. So far as we've seen. If this is still a hack, it behaves differently than others that where reported. It's possible an artifact of some sort has been left in these units from the hack. If that's the case, ideally, 1.5.17rc2 should prevent it from getting back in. But may not stop it if it is already present. You may want to try factory resetting device's that have been ugpraded to 1.5.17rc2 to hopefully clear whatever it is that is in these systems.

It may also be another bug of some sort in which case, the backup config data will help me figure out what it is.

Another thing worth trying is to try and connect a switch running rc2 with a vanilla config to the network to see if it still behaves like this. That would further help us understand what is happening.

Mon Aug 26, 2024 2:40 pm

It came from a different IP address this time. Our IT guy was able to block it on port 36508 for now. Where should I send the backup config to?
Thanks for your help

Mon Aug 26, 2024 3:28 pm

Send it to me in a PM.

One more thing, if it's been confirmed you are seeing incoming traffic that is causing this, Access Controls on the switch may also help you permanently block the offender.

Mon Aug 26, 2024 5:30 pm

What is the bypass they are using to get in without needing user name and password?
Thank you

Mon Aug 26, 2024 5:56 pm

The exact method is unknown. We have tried to patch the vulnerability by enhancing parts of the switch related to security.

The specific behavior your seeing is very different from what was occurring on other device's reporting issue's. It could also potentially be a DOS attack that is consuming excessive resource's. In which case, preventing access to the switch's from the web would really be the only way to protect them.

Quick aside, I've modified your original post, for large data dumps like the log file. It's easier on the forums to just upload it as a file (I've handled it for you in this instance, just fyi for the future).

Thu Aug 29, 2024 1:32 am

Have just upgraded 6 devices to 1.5.17rc2 a few hours ago and nothing bad seems to have happened so I offer my "hey works fine" feedback

Thu Aug 29, 2024 1:14 pm

sdwisp wrote:What is the bypass they are using to get in without needing user name and password?
Thank you

The hole was related to two open source packages we use. Which one they used we don't know:

lightttp which was the web server used to run the UI

openssl which is an open source packaged we used for HTTPS

Both packages had holes recently discovered and exploited

The hole simply allowed them to write to the flash not gain access to the linux shell.

Now they could have been malicious and put a file there that would get executed like index.html to do bad things but they just used it to display the FBI graphic. They could have been mean but were not. So they were just having fun not trying to do bad things.

All we had to do was upgrade those packages to close the hole.

We do not think this was targeted at us as it would effect any site using those packages mostly imbedded devices obviously. I am sure a lot of routers, APs, switches, and so on had this issue.

But again unless your switch was accessible via the web without using the access control list or you had an infected machine inside your net that had internet access to use as a spring board they could not infect the switch.

v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Re: v1.5.17rcX Bug Reports and Comments

Who is online