monitor restarting vtss_appl (1.3.9)

DOWNLOAD THE LATEST FIRMWARE HERE
User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 8:20 am

Sometimes, when I change something that is unrelated to LAGs and STP, I get an email message that tells about a LAG becoming active that should not have been down whatsoever. The log has this:

Code: Select all
Jun 8 12:28:25 UI: Configuration changed by 194.x.x.x
Jun 8 12:28:25 UI: Port 6 PoE: changed from '48V' to 'Off'
Jun 8 12:28:33 monitor: restarting vtss_appl
Jun 8 12:28:35 STP: set port 21 to discarding
Jun 8 12:28:35 STP: set port 22 to discarding
Jun 8 12:28:35 STP: set port 23 to discarding
Jun 8 12:28:35 STP: set port 24 to discarding
Jun 8 12:28:35 STP: set port 22 to learning
Jun 8 12:28:35 STP: set port 22 to forwarding
Jun 8 12:28:35 STP: set port 22 to discarding
Jun 8 12:28:35 STP: set port 21 to learning
Jun 8 12:28:35 STP: set port 21 to forwarding
Jun 8 12:28:36 LACP: starting negotiation with partner EC-13-B3-01-48-EC
Jun 8 12:28:36 LACP: starting negotiation with partner 1C-BD-B9-DD-67-1A
Jun 8 12:28:37 STP: set port 24 to learning
Jun 8 12:28:37 STP: set port 24 to forwarding
Jun 8 12:28:37 STP: set port 23 to learning
Jun 8 12:28:37 STP: set port 23 to forwarding
Jun 8 12:28:38 LACP: LACP changed state to Active on port 22 (key 1)
Jun 8 12:28:38 LACP: LACP changed state to Active on port 24 (key 2)
Jun 8 12:28:38 LACP: LACP changed state to Active on port 21 (key 1)
Jun 8 12:28:38 LACP: LACP changed state to Active on port 23 (key 2)
Jun 8 12:28:38 STP: set port 27 to discarding
Jun 8 12:28:38 STP: set port 21 to discarding
Jun 8 12:28:38 STP: set port 22 to discarding
Jun 8 12:28:38 STP: set port 28 to discarding
Jun 8 12:28:38 STP: set port 23 to discarding
Jun 8 12:28:38 STP: set port 24 to discarding
Jun 8 12:28:38 STP: set port 27 to learning
Jun 8 12:28:38 STP: set port 21 to learning
Jun 8 12:28:38 STP: set port 22 to learning
Jun 8 12:28:38 STP: set port 27 to forwarding
Jun 8 12:28:38 STP: set port 21 to forwarding
Jun 8 12:28:38 STP: set port 22 to forwarding
Jun 8 12:28:40 STP: set port 27 to discarding
Jun 8 12:28:40 STP: set port 21 to discarding
Jun 8 12:28:40 STP: set port 22 to discarding
Jun 8 12:28:40 STP: set port 28 to learning
Jun 8 12:28:40 STP: set port 23 to learning
Jun 8 12:28:40 STP: set port 24 to learning
Jun 8 12:28:40 STP: set port 28 to forwarding
Jun 8 12:28:40 STP: set port 23 to forwarding
Jun 8 12:28:40 STP: set port 24 to forwarding


As the log shows, I had turned off power on port 6 which caused a restart of vtss_appl a few seconds later - which in turn caused an re-initialization of the LAG/STP setup on ports 21/22 and 23/24 - causing traffic interruption for 5 seconds(!) until STP had settled.

Also, while being properly configured for SMTP, no message was sent for the "LACP changed state to Active" event, although the twin switch on the other end of the port 21/22 LAG did. The twin switch did not restart vtss_appl - it was a victim of whatever went wrong here - so I guess the message was not sent because vtss_appl had just started.
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7421
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1609 times
Been thanked: 1326 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 9:40 am

v1.3.9 is no longer being developed.
If you have a problem or possible bug please upgrade to latest final version (in this case v1.4.1) and see if you problem goes away.

If your problem still persists then upgrade to the latest RC version (in this case v1.4.2rc6) and see if you issue goes away.

If your problem still persists then post the issue and we will investigate. In this case I would go straight to v1.4.2rc6 as it fixed a bug in v1.4.0 and v1.4.1. (read release notes)
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 11:13 am

When customers use beta versions and get into problems, they are told it's all their fault because they should never use betas on productive sites. Then, if they find a bug in the latest stable release - which is 1.3.9 for me, as 1.4.0 and 1.4.1 have bugs and 1.4.2 is not released, they are told to use a beta version ...

I will certainly not upgrade productive sites to RC versions to help you debug why vtss_appl dies or hangs and needs to be restarted. I will only do field upgrades to a stable release if I see that no one else reports serious issues for a couple of weeks. That's a fixed policy.

If you had checked your internal change log (which must be much more detailed internally than what customers get to see) and had found something that sounds related and has been fixed after 1.3.9, I would actually consider breaking my rules and give it a try on a RC version - but not if the only indication for a fix is a higher version number. How likely is it that a bug was fixed after 1.3.9 if you don't know it? Or is there a tooth fairy that pulls bugs from firmware in the night?
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7421
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1609 times
Been thanked: 1326 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 11:25 am

Thomas v1.3.9 is done being developed, it is a closed development thread since January 30th, 2016.

What would you want us to do dig it out of the archive look at v1.3.9 and release v1.3.9b then back port those changes into v1.4.0 and v1.4.1 and now v1.4.2?

v1.4.1 is fine if you disable the Discovery Tab which caused HIGH CPU usage because of a bug in the CDP discovery when it received a malformed packet.

v1.4.2rc6 fixes that and a few other bugs but for the developer to look at a bug he would be messing with the code for v1.4.2rcX as v1.3.9 is closed.

And yes sometimes other unknown bugs are fixed while fixing known bugs. Sometimes by causality and sometimes by design and sometimes by luck.

Personally I would and I am using v1.4.2rc6 on "all" of my towers. Normally I would be running v1.4.1 FINAL on most and v1.4.2rcX on a couple but since I wanted the Discovery Tab I upgraded a couple towers everyday (night) until they were all running it.

If someone has a bug they find asking them to upgrade to the current code being developed is not unreasonable as how else would we provide them a fix otherwise?

When people confirm a bug exists in the latest code we jump right on it and work on it until it is fixed (usually hours or maybe days), most other companies take how long to fix known bugs?
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7421
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1609 times
Been thanked: 1326 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 11:34 am

The fix for the HIGH CPU with the Discovery Tab was from us sending the person that reported it a new v1.4.2rcX version which he tried.

The first attempt failed to fix it but the second firmware we sent him fixed it so we released v1.4.2rc6 to the masses.

This is what we would do if you have a confirmed bug in the current thread which is v1.4.2rc6
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7421
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1609 times
Been thanked: 1326 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 11:39 am

Now v1.4.2rc6 is getting close to going FINAL as we need to provide the assembly house with firmware to pre-flash the boards and we would prefer not to have 6,000 switches flashed with v1.4.1 as it has known bugs that are fixed.

So if you have a bug and want it fixed for the next FINAL release which would be v1.4.2 then you are welcome to see if you do have a bug and if so report it and we will get it in v1.4.2 else it will be addressed in v1.4.3rcX if/when you confirm it is in v1.4.2 FINAL.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 1:34 pm

I don't want to you to release 1.3.9b. I just want you to consider that this bug has not been fixed by luck.

As I see it, there's no debate that this is a bug - although it doesn't occur every time - because the log excerpt shows quite clearly what happened. At that point, unless you are sure that it has been fixed after 1.3.9, it should be your responsibility to verify it has been fixed by luck or accept the thought that a likely bug needs further investigation. Personally, I wouldn't work on the assumption that bugs get fixed by luck but I certainly don't want to find out for you which bugs were fixed by luck, knowingly or not at all - not if I have to do this on the back of my customers by testing versions that are already known to have other bugs.

I don't expect you to delay 1.4.2 either. Fix it in 1.4.3 or whenever you found out what it is. I won't roll out 1.4.2 and further versions in my network then, unless one of them is known to fix something I need a fix for. Actually, if you said you will look into it and never find a fix, even if you decide to silently ignore the issue, I'd feel better than being told my report will only be considered to be real if I can prove it exists in the newest RC.

Anyway. If you have trouble reproducing the bug, please feel free to get back and ask for more details that I can provide from my 1.3.9 environment. You could possibly ask whether that port 6 had STP enabled when I turned power off on it. The answer would be "no" and I would offer to send you the config file.
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7421
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1609 times
Been thanked: 1326 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 2:55 pm

Thomas I am not going to argue but I really do not think it is unreasonable to ask people if they find a bug to verify the bug existed in the latest version before we take time to LAB it and find out.

Once people report a bug in the latest firmware thread we jump right on it and fix it immediately, we are able to do this because users do not have us spending hours chasing bugs down that "may" have already been fixed.

We get people reporting issues with firmware as far back as version 1.0.8 and they do not even take the time to read the release notes or the many PM's I have sent over the years urging people about major bugs and they should upgrade.

You reported a bug in a firmware (v1.3.9) that is 6 months old, there were many changes that occurred.

You are obviously already set up to see if the bug exists very quickly so installing v1.4.2rc6 on a single switch to see if the problem is fixed is minimal effort on your part. For us to setup a LAB to test your scenario may take hours only to find out we can not recreate it with the current version and we are not going to setup a LAB with v1.3.9.

When I say something is fixed by luck you should well understand what I mean as many times a bug in one section of code can affect multiple sections/functions but those other issues were never reported and thus were never included in the release notes as a fix.

I am sorry if you do not like our bug reporting and fixing process but we are one of the fastest companies out there to jump on bugs and squash them. We are a small company building switches for a specific industry so our resources are very limited but I think we have achieved a LOT in the past 2 years with just 3 people. 1 hardware engineer, 1 software engineer, and me what ever I am. We also allow people to call us (me) and I do my best to work with people. I do not have some lab tech guy to say hey go LAB up Thomas's issue with v1.3.9 and see if you can recreate then see if it is fixed in v1.4.1 or v1.4.2rcX.

tma wrote: I will certainly not upgrade productive sites to RC versions to help you debug why vtss_appl dies or hangs and needs to be restarted. I will only do field upgrades to a stable release if I see that no one else reports serious issues for a couple of weeks. That's a fixed policy.


I think the above statement speaks volumes, but for the record I am not asking you to develope our code but simply find out if your problem you are reporting is still valid in the latest firmware.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
tma
Experienced Member
 
Posts: 122
Joined: Tue Mar 03, 2015 4:07 pm
Location: Oberursel, Germany
Has thanked: 15 times
Been thanked: 14 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 8:33 pm

You're putting it as if I'm too lazy to install an RC version when in fact I'm only too afraid to do that on a live site. I don't know how your business works but our only capital is reliability when every little township we're in has xDSL at 50 or 100 meg from competitors. If our net is down in one of them for 4 hours, I will lose 20 customers within the next year because the only thing they seem to remember well is outages.

I've seen this issue happen a couple of times now but only on live sites, not in the LAB, always only when an unrelated port state changes - like plugging into a port or unplugging or turning power on or off (which is about the same). It doesn't happen on sites with only one Netonix because there's no LAG and no STP. It happens in a setup with 2 DLINK and 2 Netonix switches with a LAGged STP ring across them. These are the core sites and an outage will affect multiple leaf sites and many more customers.

Because one of these events also triggered the 8 Mbps 15 Kpps issue until the engineer on site broke the LAG-ring between the two Netonix, which gave us a 3 minutes outage, I've changed our policy that I will no longer do a config change on a core site Netonix switch unless an engineer is nearby and ready to pull LAG cables. And no, I wasn't able to capture evidence of the 8 Mbps thing again - I was trying to make it disappear, yelling at the poor engineer because my first thought was he had done something wrong. I've had a look at the log now, though, and I see vtss_appl was restarted, LAGs came up and STP ports went discarding-learning-forwarding, some more than once (why?), and then there's nothing in the log for 3 minutes while the switch locked up.

My problem with your bug reporting and fixing process is this: Let's assume you release 1.4.2 and I wait for 3 weeks to see if it would be safe to use. Then I upgrade one site to 1.4.2 and some 3 weeks later this issue happens again. Meanwhile you will be working on 1.4.3 or 1.4.4. Will you again not accept a bug report because I'm still on 1.4.2 and it could have been fixed in the meantime?
--
Thomas Giger

User avatar
sirhc
Employee
Employee
 
Posts: 7421
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1609 times
Been thanked: 1326 times

Re: monitor restarting vtss_appl (1.3.9)

Wed Jun 08, 2016 10:12 pm

Thomas I have been an ISP since 1997 and a WISP since 1999 so I am well aware how the WISP industry is. In fact I was called out last night at 1AM and did not get home until 5AM because a tower was down. I am "very" familiar with the WISP industry.

Before I push a new RC version for download I roll it out to 5 towers at my WISP, all the switches in my WISP office, and all the switches at the RF Armor and Netonix faculty. Over the next couple days I roll it out to all my WISP towers. It is my understanding of the WISP industry that I try it on my WISP first and this understanding of how WISPs work and what is needed is why on a Friday night at 11PM of Superbowl weekend last year Joe Portman reported a bug with DHCP that he needed fixed so we worked all weekend to push out a fix and I remember hitting submit just as the Seahawks threw the interception and lost at 11:37PM.
viewtopic.php?f=17&t=240&start=40#p2809

Are we prefect? - No but we try out best.

My advice to people is to roll out a new RC or even a FINAL version to an easy to access switch such as an office or LAB then to an easy to access tower then slowly roll it out as they feel comfortable.

However if you find a bug and you want us to look at it you "need" to update to the latest version and see if the issue is still there and if so report it along with how to re-create it.

If everyone waits for everyone else to test everything then the only person testing anything is me and that is not so good.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

Return to Hardware and software issues

Who is online

Users browsing this forum: No registered users and 16 guests