12AC Switch Glitch

DOWNLOAD THE LATEST FIRMWARE HERE
User avatar
rockhead
Experienced Member
 
Posts: 119
Joined: Mon Aug 04, 2014 7:09 pm
Has thanked: 53 times
Been thanked: 35 times

12AC Switch Glitch

Sun Aug 02, 2015 1:44 pm

Flippin furry freakout time :willy:

Yesterday I dropped in a new 12A -AF24--Af24-12AC collection of kit. Things rolled smoothly for 20 ish hours and then suddenly no data on the AF5X that continues the backhaul. Wireless access to the radio still worked, it claimed to have a LAN connection. Log into the 12AC from the outside and it seems fine, claims to have a gig connection but zero data moving.
I bounced the port and after the radios relinked all is well, but for how long ?
Switch running 1.2.4

User avatar
sirhc
Employee
Employee
 
Posts: 7415
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 3:18 pm

So why do we think it is a glitch with the switch and not the AF5?

What testing have you done to determine this is a "Glitch with the WS-12-250-AC"?

Statements like this is what leads to hysteria and causes people to get trampled leaving a building because someone smelled a whiff of smoke from somebody smoking in the boys room crapper stahl and yelled "FIRE".

Maybe it is our switch, maybe you have a bad unit, it is possible but so far you have no evidence who or what is at fault but you have classified it as a "Glitch in the Switch", WTF?

4,000 WISP Switches in service and we are on RMA #19 and 4 or 5 were upgrade the firmware and return, 3 are open and never received, 2 were very apparent lightning damage, so what is that 9 out of 4,000 so a failure rate of 0.2%.

Number of Actual Software bugs that were our fault thus far less than 10. NOT INCLUDING SFP COMPATIBILITY CHANGES.

So far I have had to RMA and replace 4 AF24 radios out of 24 over the past 3 years, that is a 16% failure rate. All with Ethernet port issues.

What was the failure Rate of a Rocket M5 Titanium again?

I have heard TONS of people complaining they are having to RMA AF5X with Ethernet issues where they fail to keep a 1G link AND THEY ARE NOT USING OUR SWITCHES.

Did you look in the switch log to see if there was anything there, SUCH AS A FALSE LOOP DETECTION?

If so then Disable Loop Protection and wait for us to determine if this is our fault or people need to investigate their other equipment.

I have several post on here explaining how loop protection works and depending on how some people setup their network a false loop Protection can cause the switch to disable the port but you need to check the system log.

We do not think we are handling Loop Protection wrong but if another piece of equipment handles the packets used to detect this incorrectly then a False Loop can be detected. Is it our fault or other equipments Fault, that is yet to be determined.

Personally I have NEVER gotten a false loop detection but the other equipment I use and the way I set up my network is different then others.

So far a "HANDFUL" of people are having an issue with Loop Protection but we have not yet figured out if we are doing anything wrong or some other equipment people are using are treating the special packets used incorrectly and causing them. The jury is out.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7415
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 3:38 pm

I hear some people come on here and complain they are seeing Ethernet Errors so it must be the switches fault?

I would like to know where they buy crystal balls from because I need 2.

COuld it be their cabling, possibly their crimps, maybe the surge suppressor or even the radio they are talking too......COULD BE but it is much easier to blame the switch first I guess.

Well you can see below I have towers up for months with 2, TWO I SAY, errors in going on 2 months.

The switches are made in a cookie cutter style process (not in my garage or my bathtub) so they are all the same so why can I get them to work so well if there are SOOOOOO many issues with them?

How do we know those 2 CRC errors shown below are my switches fault? They are Rx CRC Errors so it is more likely that the AF24 is at fault.

That is 2 CRC Errors out of 32+ Terabytes of data transfer.

2 OUT OF 39 BILLION PACKETS HAD AN ISSUE. - THATS AWESOME!!!


So if there is something wrong with the switch design how comes I and so many others have no issues but a few do and thus to them it "MUST" be the switch design or firmware?

First off the firmware has NOTHING to do with Ethernet Errors, the switch core is a self contained package, we just configure the basics to the core which is what we write, the UI and CLI, the rest is written by Vitesse and this package has been on the market for several years and is used in many other switches on the market from Cisco to Level1 to Telecom switches. I am pretty sure if there was something wrong with the Core Functions it would have been discovered and fixed in 3 years since this package has been on the Market.

We can mess up VLAN configurations, we can mess up "settings" but as far as Ethernet Errors....NO

There are some routines that reside on the Linux CPU such as Loop Protection where our routine produces a special packet and sends it out a switch port and we tell the switch core to send a copy of special packets such as those back to us if it sees one so we can analyze it to determine if a Loop is out there but then we have to then tell the core to disable a port and then after a period of time passes the routine instructs the Core to enable the port.

We can tell the switch core to use Auto, 100M, 1000M stuff like that. It is a simple command we issue to tell the core to use a specific mode.

We do not look at every packet, that is a ROUTER NOT A SWITCH CORE.

We configure the core and then leave it alone and it does it's thing.

We pole it for stats then make those stats available to you via SNMP or a pretty display on the UI, that is it, the exact same as any Switch manufacturer does.

I see people say our CPU utilization is low, OF COURSE IT IS LOW, all the CPU does is run the UI, CLI, and stuff like that, the transfer of packets is done in the "Switch Core" which is hard coded software, we can not change it.

Look if you put 10 nics in a computer and then make a switch out it using Linux then the Computer CPU touches every packet, this is called a "SOFT SWITCH" where it is all done via the computer CPU, this is NOT what a WISP Switch, or a ToughSwitch or an EdgeMAX switch is, they are Switch Cores with a UI to configure them. The switch core and it's code is made by Vitesse or Broadcom.

CLICK IMAGES BELOW TO VIEW FULL SIZE
Ether.jpg
Ether.jpg (298.55 KiB) Viewed 7640 times

Ether2.jpg
Ether2.jpg (223.74 KiB) Viewed 7640 times

temp.jpg
temp.jpg (123.47 KiB) Viewed 7640 times
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7415
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 4:34 pm

So in closing, do not "FREAK OUT", "investigate", break it down into small steps and decide what it could be then report back as much information as you can so we can help.

Find a paper bag somewhere and breath into it until you calm down, then start your investigation.

1) Look in the Switch Log and see if there is anything about a port being disabled such as Loop Protection, if so disable it and wait for the verdict if we are at fault or not. If we are not then you need to investigate further into your network topology and other equipment used or simply decide not to utilize Loop Protection (TURN IT OFF).

2) Try moving the device to another port and see if the problem follows the cable/device to the next port, if it does then I doubt it is the switch!

3) Lastly if all else fails try replacing the radio and see if the problem stops.

If any of you know anything about me is that if I am at fault I openly admit it, so far I have not found any design flaws in our switch and I have been using them for 1 year now.

Also if you HARD CODED the Speed Duplex then DO NOT becuase if the cable got warm from the sun then the resistance and cross talk changes THEN LINK CAN BECOME UNSTABLE AND FAIL. Ethernet was designed to Auto Negotiate to deal with changing environments such as cable characteristics, ground potential differentials and so on.

This is yet another reason why you should always bond tower ground to service ground...ALWAYS! Elese your Ethernet cable is the bond.

People should Google Ethernet Transformers Magnetic Coupling and read several articles on this and they all talk about GROUND POTENTIAL DIFFERENCES. Just like most wireless chipsets were NOT designed for outdoor use there are no Ethernet transformers designed to deal with more than 15 ohms of ground potential difference, get to 20 to 25 ohms and POP goes your Ethernet transformers. This can easily occur if your tower ground is not boned to your electric service grounds.

GROUND IS NOT GROUND, and a ground potential difference is CURRENT, and current can kill things.

YOU SHOULD NEVER HARD CODE SPEED AND DUPLEX IN MY OPINION EXCEPT AS A LAST RESORT TO SOLVE AN ISSUE BUT THEN YOU RISK HAVING A SITUATION WHERE A PORT LINK FAILS AND THE LINK CAN NOT RE-NEGOTIATE AT THE SPECIFIED RATE AND IT BECOMES INACTIVE.

A LINK CAN BECOME CONFUSED AND SINCE IT CAN NOT STEP DOWN IT GETS LOST AND THE ONLY WAY TO BRING IT BACKUP IS TO BOUNCE IT.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
rockhead
Experienced Member
 
Posts: 119
Joined: Mon Aug 04, 2014 7:09 pm
Has thanked: 53 times
Been thanked: 35 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 5:01 pm

Why do I suspect the switch and not the AFX ? How about because the X and its cable have been in place for many undramatic months.
Yesterday I drop in a new switch, express replacement of another Netonix, rather than do updates on the switch and interrupt operations, I updated a brand new unit, configured it and then swapped the two.

I wasn't looking to slag your gear Chris, I was more hoping for somebody to say "oh hey do this don't do that", there was nothing in the log and one lonely error showing on the interface.

As far as not freaking out, can't be helped when 90% of my clients get randomly disconnected I find it very alarming.

User avatar
sirhc
Employee
Employee
 
Posts: 7415
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 6:23 pm

rockhead wrote:Things rolled smoothly for 20 ish hours and then suddenly no data on the AF5X that continues the backhaul. Wireless access to the radio still worked, it claimed to have a LAN connection. Log into the 12AC from the outside and it seems fine, claims to have a gig connection but zero data moving.
I bounced the port and after the radios relinked all is well, but for how long ?
Switch running 1.2.4


This sounds like one of the 2 following issues:

1) Was this port hard Coded to speed and duplex? If so I strongly suggest against it.

2) Did you check the Switch Log and see if the port was disabled from a False Loop Protection? If you see the port was disabled from a Loop Detection disable Loop detections and let us know.

If it is a Loop Detection that caused the switch to shut down the port then you can disable Loop Protection but you really want to ask why it thought that.

As I said we are still investigating this issue but it is simple to disable it. Go to the Device/Configuration Tab and uncheck Loop Protection.

How loop protection works is different than RSTP, it sends out a special packet and the packet has to come back to the switch. It does not just mean there is a loop between this switch and another switch like RSTP it means somewhere out there past this port is a loop, maybe on another switch.

Meaning if you feed another switch from port 2 of a WISP switch and then on the other switch you connect it's port 10 to say port 20 that special packet will make it back to the sending switch and it will sense that there is a loop out there and it will shut down the port. This special packet propagates as far as it can on the Layer 2 segment and if somewhere out there is a loop this routine will pick it up.

We had a problem with Loop Protection packets getting across VLANs which we fixed but maybe another device out there lets these special packets bleed across VLANs and causes these detections...... as I said we are exploring this issue now.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
sirhc
Employee
Employee
 
Posts: 7415
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1608 times
Been thanked: 1325 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 6:37 pm

If it happens again I would still disable Loop Protection as it is under investigation.

Just because the AF5 has been there does not mean it did not have the hic up, just saying.

Beyond that you could post up your config tabs and explain the configuration and I will see what I can come up with.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
rockhead
Experienced Member
 
Posts: 119
Joined: Mon Aug 04, 2014 7:09 pm
Has thanked: 53 times
Been thanked: 35 times

Re: 12AC Switch Glitch

Sun Aug 02, 2015 7:22 pm

The one place where I can imagine a loop existing real or imagined is amongst a trio of unifi AP's that are connected one direct to the L2 bridge and the other two by unifi wireless uplink.

I did not see Loop in the log, and have dumped the log since it was overrun with (presumed) chinese login attempts once connected directly to the web. I will keep an eye on it and drop the loop detection early tomorrow AM. The config is extremely basic, no vlans or trunks, haven't even turned on STP, yet.

User avatar
rockhead
Experienced Member
 
Posts: 119
Joined: Mon Aug 04, 2014 7:09 pm
Has thanked: 53 times
Been thanked: 35 times

Re: 12AC Switch Glitch

Mon Aug 03, 2015 6:08 pm

Well thirty hours on and all is well, so far, I did remove the loop protection in this mornings tweak session, also enabled rstp to cover my AF24. I sure do wish I had repeatable problem but for now I guess its wait and watch.

Return to Hardware and software issues

Who is online

Users browsing this forum: No registered users and 41 guests