You make some good points:
Omniflux wrote:Making a change that requires restarting the network interfaces currently takes 4 minutes. That is fine because I have the revert timer set to the maximum (5 minutes). What will happen if the change takes longer than the revert timer?
The revert timer does not begin until after the config is finished being applied. Depending on which parts of the config changes determines what happens with lower level things like the network service. So in your case (which I believe is the default) if the switch does not respond for 5 minutes after the switch has finished re-configuring everything like the network interface, vtss_appl, etc that is when it is suppose to revert.
Omniflux wrote:Are you considering updating to something based on a more recent version of OpenWRT using netifd and procd? I suspect this would resolve this issue, although that is a major change for an admittedly minor inconvenience.
This is a possibility I explored with Eric in depth awhile ago. Unfortunately, although this is a variant of OpenWRT, when vitesse built the example board that the switch-core (which is an ASIC doing all the hardware switching and most of the heavy protocol work) as well as the MIPS32 CPU (that runs all the normal software including the OS/openwrt variant) they broke some of what might be considered convention in the linux world. Specifically, the kernel and vtss_appl (which is primarily responsible for configuring the switch-core/ASIC) run nearly as one even though technically, vtss_appl exists in userspace (i.e. it is not a kernel module).
Although after reading the documentation vitesse created for the original development board. I can completely understand why they needed to do it this way and I suspect many hardware-based networking technologies have probably been forced to use a similar design paradigm. It has still resulted in complications like this that make it very difficult to upgrade the OS itself. Which, unlike the main branch of OpenWRT was never intended to be deployed across multiple hardware architecture's. We suffice by upgrading some of the packages that it uses, like https, ssl, snmp, etc for security and compatibility. And for the occasional issue like this, someone like me has to go in with a scalpel or a hammer (depending on the severity) to keep it all together.
What I might try to do eventually is integrate some of the other newer tools like netifd and procd into what is our version and rebuild the relevant parts of our system around it. I've actually tried doing this already with gdb SEVERAL times because it could take alot of the guesswork out of development, but unfortunately one of the limiting factor's of the hardware is memory and the build fails. Also, at the same time we have other products we are trying to complete and our resource's are limited. As any of my senior's can tell you this is a balancing act. Also because of that I can't make a decision like that completely on my own either.
Omniflux wrote:I will try to get out to the site late tomorrow night.
That would be great, thank you.