Potential bugs in WS3 firmware causing strange behaviour
-
Stephen - Employee
- Posts: 1033
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 85 times
- Been thanked: 181 times
Re: Potential bugs in WS3 firmware causing strange behaviour
Hey Garnet, this is the highest item on the list for bugs but unfortunately there are a few other items I'm working on for the factory that have to be completed first. When I make some forward progress on it I will post updates on this thread.
Re: Potential bugs in WS3 firmware causing strange behaviour
Hi Stephen,
Thank you for the update. Are these other items something we could help expedite by doing testing on our end. We have extensive experience in both the low level and high level aspects of these devices at our disposal.
Thank you for the update. Are these other items something we could help expedite by doing testing on our end. We have extensive experience in both the low level and high level aspects of these devices at our disposal.
Re: Potential bugs in WS3 firmware causing strange behaviour
Please let me know if there is any additional testing we can do to expedite bug fixing.
Re: Potential bugs in WS3 firmware causing strange behaviour
Update for 2021-07-12, additional testing was performed on the WS3
1. switched to backup firmware (2.0.6rc4 slot #1)
2. Attempted config restore
2.1. switch rebooted
2.2. PHP warning on boot "file_get_contents(/tmp/config/interface.1): failed to open stream: No such file or directory in /www/config.php on line 601"
2.3. signed into switch checked config, config restored successfully
Working theory at this point, firmware slot #0 is corrupt
3. Performed firmware upgrade with slot #1 selected. As per switches pop-up message this should overwrite slot #0 with a fresh copy of 2.0.6rc4
3.1. Upon switch reboot attempted restore of the last known full configuration file (the intended production config).
3.2 switch rebooted, switch locked up for approx 10 minutes before giving the same PHP errors as previously mentioned in this thread.
3.3 switch is set to static IPv4 with and address of 192.168.1.20, switch does not have an IP address
Working theory at this point, firmware slot #0 or the mechanism that overwrites it is corrupt / damaged.
To rule out a corrupt firmware file I downloaded a fresh copy and compared the sha256 hash to the file I have been using, as expected the hashes match.
4. factory defaulted via RS232 interface
5. switched to firmware slot #1
6. attempted config restore again (same file as previously)
6.1. switch locked up for approx 10 minutes, same PHP error after lockup cleared, switch did not have an IP address.
6.2 factory defaulted from RS232 interface
7. Attempted config restore of file that worked previously (slot #1 selected)
7.1 config restored successfully
This additional testing appears to highlight two issues with the unit.
1. Firmware slot #0 is functionally different from slot #1
2. There are still unidentified 'triggers' in configuration that can cause a switch lockup
1. switched to backup firmware (2.0.6rc4 slot #1)
2. Attempted config restore
2.1. switch rebooted
2.2. PHP warning on boot "file_get_contents(/tmp/config/interface.1): failed to open stream: No such file or directory in /www/config.php on line 601"
2.3. signed into switch checked config, config restored successfully
Working theory at this point, firmware slot #0 is corrupt
3. Performed firmware upgrade with slot #1 selected. As per switches pop-up message this should overwrite slot #0 with a fresh copy of 2.0.6rc4
3.1. Upon switch reboot attempted restore of the last known full configuration file (the intended production config).
3.2 switch rebooted, switch locked up for approx 10 minutes before giving the same PHP errors as previously mentioned in this thread.
3.3 switch is set to static IPv4 with and address of 192.168.1.20, switch does not have an IP address
Working theory at this point, firmware slot #0 or the mechanism that overwrites it is corrupt / damaged.
To rule out a corrupt firmware file I downloaded a fresh copy and compared the sha256 hash to the file I have been using, as expected the hashes match.
4. factory defaulted via RS232 interface
5. switched to firmware slot #1
6. attempted config restore again (same file as previously)
6.1. switch locked up for approx 10 minutes, same PHP error after lockup cleared, switch did not have an IP address.
6.2 factory defaulted from RS232 interface
7. Attempted config restore of file that worked previously (slot #1 selected)
7.1 config restored successfully
This additional testing appears to highlight two issues with the unit.
1. Firmware slot #0 is functionally different from slot #1
2. There are still unidentified 'triggers' in configuration that can cause a switch lockup
Re: Potential bugs in WS3 firmware causing strange behaviour
Following my previous post I attempted to test my theory that firmware slot #1 was functioning differently to slot #0. The following takes place immediately after point 7.1 of my previous post.
1. Enabled DHCP
1.1. Switch did not get an IP, rebooted
1.2 Upon reboot switch received a DHCP lease
2. Signed into switch (via direct ethernet connection)
3. renamed switch ports to match intended production config
3.1 config change successful
4. Seemingly at random switch lost IP address
4.1. 'ip address' command reports switch IP is 169.254.73.148
Conclusion:
Switch continues to exhibit seemingly random glitches and issues.
1. Enabled DHCP
1.1. Switch did not get an IP, rebooted
1.2 Upon reboot switch received a DHCP lease
2. Signed into switch (via direct ethernet connection)
3. renamed switch ports to match intended production config
3.1 config change successful
4. Seemingly at random switch lost IP address
4.1. 'ip address' command reports switch IP is 169.254.73.148
Conclusion:
Switch continues to exhibit seemingly random glitches and issues.
Re: Potential bugs in WS3 firmware causing strange behaviour
It would be greatly appreciated if someone could let me know whether this is an issue unique to our WS3 or if others are having issues. I can have it sent back for RMA again if need be.
In the meantime I hope these latest updates aid in bugfixing for the next firmware release.
In the meantime I hope these latest updates aid in bugfixing for the next firmware release.
-
Stephen - Employee
- Posts: 1033
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 85 times
- Been thanked: 181 times
Re: Potential bugs in WS3 firmware causing strange behaviour
Hey Garnet, I know you've had to wait awhile for me to get back to this I apologize but it was unavoidable.
Fortunately, I can finally start working on this again. I've spent the whole day testing this and here is what I've found so far:
My network setup is simple, the WS3 is passing through another WS switch connected to a netgear router acting as the DHCP server.
You can see the results in the logs here:
I tried this many many times using the provided config along with a few variations I setup and I was still unable to replicate the issue's you are showing (that is to say, dhcp provided a lease successfully 100% of the time for me). So I figured what we must be seeing (such as the weird hang times on boot and random reboots etc) must be occurring as a result of something going wrong at this early stage.
Eventually I was able to find out what could cause this.
If I disconnect the network cable to my switch and change from static to DHCP from the console port directly then I was finally able to replicate the issue's you've been mentioning.
The first thing that happened when dhcp inevitably failed to obtain a lease is that the results from
Would sometimes show the IP address as "Unknown"
Other times it would catch it when the system reverts to the 169 default IP.
And using the system command:
or
Showed the ip address to be 169.254.195.50/24 for the vtss.vlan.1 interface.
Which is a default private IP built into the network driver.
Here are the logs of the event:
Here are the results from the above commands (by the time I captured it, it was on an instance where it caught the default ip):
As I started using the switch after this I also started noticing some strange issue's creep up. Reboots might hang for awhile and I've seen a few random reboots during normal operation.
After looking closer this makes sense, parts of the config and status for the switches firmware are not in sync with the underline OS which means null pointers are just waiting to be stepped on depending on events in the system.
So this is definitely a bug, if the dhcp lease acquisition fails I would prefer it to take on the default IP address for the switch and not the one built into the firmware.
However, even if I fix this it won't solve your original problem. I cannot see any reason why dhcp itself would fail without there being some sort of connectivity issue between the WS3 and the DHCP Server provider. Or possibly a compatibility issue if it's maybe a new version for the dhcp server itself and they've changed the protocol? I really doubt that though.
It is still possible there is something wrong with that unit, if you want you can RMA the switch and I will have the factory redirect it too me personally and I will run it through the exact same series of tests I've done on the one I'm currently using.
Fortunately, I can finally start working on this again. I've spent the whole day testing this and here is what I've found so far:
My network setup is simple, the WS3 is passing through another WS switch connected to a netgear router acting as the DHCP server.
You can see the results in the logs here:
I tried this many many times using the provided config along with a few variations I setup and I was still unable to replicate the issue's you are showing (that is to say, dhcp provided a lease successfully 100% of the time for me). So I figured what we must be seeing (such as the weird hang times on boot and random reboots etc) must be occurring as a result of something going wrong at this early stage.
Eventually I was able to find out what could cause this.
If I disconnect the network cable to my switch and change from static to DHCP from the console port directly then I was finally able to replicate the issue's you've been mentioning.
The first thing that happened when dhcp inevitably failed to obtain a lease is that the results from
- Code: Select all
show status
Would sometimes show the IP address as "Unknown"
Other times it would catch it when the system reverts to the 169 default IP.
And using the system command:
- Code: Select all
ip addr show
or
- Code: Select all
ifconfig
Showed the ip address to be 169.254.195.50/24 for the vtss.vlan.1 interface.
Which is a default private IP built into the network driver.
Here are the logs of the event:
Here are the results from the above commands (by the time I captured it, it was on an instance where it caught the default ip):
As I started using the switch after this I also started noticing some strange issue's creep up. Reboots might hang for awhile and I've seen a few random reboots during normal operation.
After looking closer this makes sense, parts of the config and status for the switches firmware are not in sync with the underline OS which means null pointers are just waiting to be stepped on depending on events in the system.
So this is definitely a bug, if the dhcp lease acquisition fails I would prefer it to take on the default IP address for the switch and not the one built into the firmware.
However, even if I fix this it won't solve your original problem. I cannot see any reason why dhcp itself would fail without there being some sort of connectivity issue between the WS3 and the DHCP Server provider. Or possibly a compatibility issue if it's maybe a new version for the dhcp server itself and they've changed the protocol? I really doubt that though.
It is still possible there is something wrong with that unit, if you want you can RMA the switch and I will have the factory redirect it too me personally and I will run it through the exact same series of tests I've done on the one I'm currently using.
Re: Potential bugs in WS3 firmware causing strange behaviour
Hi Stephen,
Thank you for the comprehensive write up. To clarify, your testing showed that after issuing a command over the console port (changing DHCP settings) some strange issues would occur?
If this is the case do you know how I could reverse the issue?
I'm not worried about DHCP issues anymore as the issues now only seem to appear when the switch itself it at fault (e.g. locking up, or otherwise glitching).
If you are right and it is some sort of desynchronization that is triggered by specific console commands under what appears to be an edge case, then I would be happy to take your advice on how to 'resync' the switch. Or in other words, if you think it's a software issue I could fix without opening the switch case than I'll hold off on the RMA until I can try the fix.
Thanks again.
Thank you for the comprehensive write up. To clarify, your testing showed that after issuing a command over the console port (changing DHCP settings) some strange issues would occur?
If this is the case do you know how I could reverse the issue?
I'm not worried about DHCP issues anymore as the issues now only seem to appear when the switch itself it at fault (e.g. locking up, or otherwise glitching).
If you are right and it is some sort of desynchronization that is triggered by specific console commands under what appears to be an edge case, then I would be happy to take your advice on how to 'resync' the switch. Or in other words, if you think it's a software issue I could fix without opening the switch case than I'll hold off on the RMA until I can try the fix.
Thanks again.
Re: Potential bugs in WS3 firmware causing strange behaviour
Update:
I have just been informed that a prior issue with the switch is still occurring I haven't had reason to sign into the test router during my testing and as such was not aware. The switch acquires two IP's via DHCP the first address 'estax-71-02-dc' appears almost immediately after the switch boots. Approx twenty seconds later the second address 'Netonix_Switch' appears.
The 'estax' address appears to be valid for a few seconds however it does not respond to pings (ping requests time out instead of returning as unreachable).
Note that the MAC of both IP's is identical and the 'estax-71-02-DC' matches the MAC of the switch.
Photo is attached.
Thanks,
I have just been informed that a prior issue with the switch is still occurring I haven't had reason to sign into the test router during my testing and as such was not aware. The switch acquires two IP's via DHCP the first address 'estax-71-02-dc' appears almost immediately after the switch boots. Approx twenty seconds later the second address 'Netonix_Switch' appears.
The 'estax' address appears to be valid for a few seconds however it does not respond to pings (ping requests time out instead of returning as unreachable).
Note that the MAC of both IP's is identical and the 'estax-71-02-DC' matches the MAC of the switch.
Photo is attached.
Thanks,
Re: Potential bugs in WS3 firmware causing strange behaviour
Has there been any more progress made on this issue?
Who is online
Users browsing this forum: No registered users and 60 guests