Garet wrote:On every boot now the switch locks up for approximate 10 minutes. I call this a lock up as unlike the normal boot process the switch consistently errors after the lockup clears. The error is always the previously mentioned PHP error repeated many times. After which the CLI presents as if nothing happened. This first occurred after I rebooted the switch via the RS232 interface.
OK that is definitely not correct behavior, the bootup cycle does take a few minutes, but not 10 so that's definitely anomalous.
Garet wrote:1. Switch arrives from RMA
2. Switch boots normally, all ports negotiate 1G, unable to access management console, connected directly to switch with
an adapter that has a static IP on the same net as switch.
3. Enabled DHCP, can now connect to switch from it's IP but only on a Windows machine
4. Performed Bench test (see previous posts)
5.. Noticed switch would not acquire an address via DHCP
5.1. Manually set address to a static IP via RS232. switch did not take IP and did not revert to it's default static
5.2. Rebooted switch from CLI
5.3. Switch sat locked up for several minutes however character echo back over RS232 was still functional
5.4. Switch finally booted, this was the first time the PHP error occurred.
5.5. Switch still did not acquire a DHCP address or revert to default static IP
5.6. Attempted reboot via DEF button (green circle), same issue as 5.3 to 5.4
That's helpful, I will try this order of events and see what happens.
My guess is around 5.1 when the switch refused to take the static IP change via the CLI something got corrupted hence why it failed to revert and had continuous problems following. Like you say, it might have been a something like a set mutex waiting for something else to finish that prevented the entered IP value from making it to the config file which may have nulled the IP in the config causing this havoc. However, if that's correct, then the the defaulting process is still a separate problem as it should have fixed it.
Garet wrote:...I hope it's clear why I would have to be extraordinarily unlucky to corrupt something...
Regardless, this has not happened to others as far as I know. Please don't forget that 99% of the operation on the switch is done automatically. The moment it is plugged in, booted, and made aware of your specific network there are dozens of processes already working and making decisions about what action the switch should take to be as optimal at passing information as possible. Any one of these processes could have been the source of locking the config via mutex, spinlock, or (most likely) semaphore, etc and caused the corruption to occur when the IP was manually entered at step 5.1.
Right now, my guess is it was the dhcpcd process, which was having trouble getting the IP address for a presently unknown reason which may have locked the entries where IP addresses are suppose to go and when the manual attempt was done, caused the issue.
Another point is that I've done this almost the same way you have many times and never seen something like this happen before. Hence I I think it must be like a scenario above where dhcpcd or other similar process must be conflicting with entries.
Either way I'm looking into it, we are pretty backlogged right now but I'd like to get to the bottom of this one.