Running 1.4.7. Started getting packet loss, which I've traced to switch ports flapping. One peculiar thing is the log only mentions ports going up, never down, and yet packet loss suggests this is real. Logging into the switch, it looses communication with the upstream router for a second or so several times a minute. Ports 13 & 14 are the connections to the upstream router (vlans to different radios divided between ports, but management via either). The upstream router is a MikroTik CCR1036-8G-2S+ and it thinks the two Ethernet connections to Netonix SW1 are stable and have been up for days. Yet clearly traffic stops flowing for a second or so several times a minute.
What's going on?
Log shows link repeatedly going up but never down
-
Brough - Member
- Posts: 93
- Joined: Tue Dec 09, 2014 3:40 pm
- Location: Boston, MA, USA
- Has thanked: 1 time
- Been thanked: 9 times
Re: Log shows link repeatedly going up but never down
Rebooting the switch has not fixed the problem but the new log entries show a delay in establishing I2C communications with ports 13 & 14:
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Log shows link repeatedly going up but never down
Well there is obviously something wrong which you are going to have to diagnose, if you lived close by I would be happy to help but you are a long way off.
Possible issues:
1) Ethernet link is dropping in and out, this would be more aspirated if the port is hardcoded to 1G instead of Auto because as soon as link drops below 1G it drops. AUto is always best to use if possible. Now with some fiber modules this is not possible but all copper modules are better used in Auto.
2) The SFP modules are bad - swap out with differnt brand.
3) The switch SFP ports are bad.
If I was you and trying to determine the cause this is what I would do.
If the ports are set to 1G instead of Auto I would change to Auto
I would determine if the cable is the possible issue by plugging my laptop into the SFP module. If the SFP module is fiber you will need a transceiver and a fiber jumper.
I would try another brand of SFP modules. Some SFP modules do not reset properly from a warm boot (rebooting in UI or from firmware upgrades) and you either need to power cycle the switch or pull and reinsert the SFP modules.
If these are copper SFP modules and you do not have your tower grounding properly bonded to the electrical service grounds you could be seeing ground current traversing the Ethernet cables disrupting Ethernet communications causing the link to drop in and out. Also if your electrical service ground is insufficient and ground current on the electrical service is using the tower grounds in which case adding a NEW ground rod to the electrical service ground rods would fix this.
If you have a spare switch (which you should) you could swap out the switch using the same config, same SFP modules, and then see if the problem persists. If it does you know it is not the switch look at cable, devices, SFP modules, and so on.
Pretty much you need to diagnose and find the issue. If I was there I would do these things and know where my problem was in less than 30 minutes.
Good Luck
Possible issues:
1) Ethernet link is dropping in and out, this would be more aspirated if the port is hardcoded to 1G instead of Auto because as soon as link drops below 1G it drops. AUto is always best to use if possible. Now with some fiber modules this is not possible but all copper modules are better used in Auto.
2) The SFP modules are bad - swap out with differnt brand.
3) The switch SFP ports are bad.
If I was you and trying to determine the cause this is what I would do.
If the ports are set to 1G instead of Auto I would change to Auto
I would determine if the cable is the possible issue by plugging my laptop into the SFP module. If the SFP module is fiber you will need a transceiver and a fiber jumper.
I would try another brand of SFP modules. Some SFP modules do not reset properly from a warm boot (rebooting in UI or from firmware upgrades) and you either need to power cycle the switch or pull and reinsert the SFP modules.
If these are copper SFP modules and you do not have your tower grounding properly bonded to the electrical service grounds you could be seeing ground current traversing the Ethernet cables disrupting Ethernet communications causing the link to drop in and out. Also if your electrical service ground is insufficient and ground current on the electrical service is using the tower grounds in which case adding a NEW ground rod to the electrical service ground rods would fix this.
If you have a spare switch (which you should) you could swap out the switch using the same config, same SFP modules, and then see if the problem persists. If it does you know it is not the switch look at cable, devices, SFP modules, and so on.
Pretty much you need to diagnose and find the issue. If I was there I would do these things and know where my problem was in less than 30 minutes.
Good Luck
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
Brough - Member
- Posts: 93
- Joined: Tue Dec 09, 2014 3:40 pm
- Location: Boston, MA, USA
- Has thanked: 1 time
- Been thanked: 9 times
Re: Log shows link repeatedly going up but never down
Thanks for the detailed response. Indeed, the copper SFP modules were at fault and the only delay in repairs was cross-town traffic to get to the site. At one time, we routinely used copper SFP modules in ports 13 & 14 for the uplinks to the router. Later we abandoned any use of copper SFPs, but there are still some in the field.
When we first started with copper SFP modules, I bought 20x FiberStore SFP-1GTA-1M modules. We stopped deploying them after several failed during initial burn-in of router-switch configurations, i.e. before the gear ever got to the field. We then converted to Maxx MW-UTP-G-US modules. These appeared to work better, until we had a few field failures 4-6 months in. So we stopped using ports 13 or 14 at all, except at the very few locations where we actually had to connect to a fiber. The Maxx copper SFP modules that failed at this site had been in service for 15 months. We had 2 free ports, so our guy swapped the uplink ports to get ports 13 & 14 out of the picture entirely.
I don't think this was a grounding problem. The copper cables from these SFP modules were 1' Cat6 jumpers between the switch and a MT router within the same rack. There's also no sign of other grounding problems (all current sensors give reasonable readings). The site is a 19 story apartment building with our gear in a 20th floor mechanical penthouse and radios on non-penetrating roof mounts, but the roof mounts are grounded to the rack which is bonded to an electrical conduit, and all the gear is grounded via the third prong electrical ground.
The ports are set to Auto and the switch had been up for 37 days at the time of the failure.
The one puzzle is why the two SFP modules failed at the same time. Looking back through Syslog, i.e. to before I rebooted the switch, when the failures started it was port 14 but 55 seconds later port 13 failed. Once failures had started, it was port 14 ~2/3rds of the time but port 13 at least 1/3rd.
Any thoughts?
When we first started with copper SFP modules, I bought 20x FiberStore SFP-1GTA-1M modules. We stopped deploying them after several failed during initial burn-in of router-switch configurations, i.e. before the gear ever got to the field. We then converted to Maxx MW-UTP-G-US modules. These appeared to work better, until we had a few field failures 4-6 months in. So we stopped using ports 13 or 14 at all, except at the very few locations where we actually had to connect to a fiber. The Maxx copper SFP modules that failed at this site had been in service for 15 months. We had 2 free ports, so our guy swapped the uplink ports to get ports 13 & 14 out of the picture entirely.
I don't think this was a grounding problem. The copper cables from these SFP modules were 1' Cat6 jumpers between the switch and a MT router within the same rack. There's also no sign of other grounding problems (all current sensors give reasonable readings). The site is a 19 story apartment building with our gear in a 20th floor mechanical penthouse and radios on non-penetrating roof mounts, but the roof mounts are grounded to the rack which is bonded to an electrical conduit, and all the gear is grounded via the third prong electrical ground.
The ports are set to Auto and the switch had been up for 37 days at the time of the failure.
The one puzzle is why the two SFP modules failed at the same time. Looking back through Syslog, i.e. to before I rebooted the switch, when the failures started it was port 14 but 55 seconds later port 13 failed. Once failures had started, it was port 14 ~2/3rds of the time but port 13 at least 1/3rd.
Any thoughts?
-
sirhc - Employee
- Posts: 7416
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1608 times
- Been thanked: 1325 times
Re: Log shows link repeatedly going up but never down
I know you say everything is grounded well but I always tend to look at grounding as the issue in events like this.
As you guys know RF Armor recently started repairing airFIBER radios.
This past month we repaired 12 airFIBER 24 and 24HD radios and every single one was damaged from ground current. This is where the Ethernet transformer, or PHY is burned out.
Read this post on airFIBER repairs: viewtopic.php?f=21&t=2827#p19482
99% of damage to WISP equipment is from ground current.
But if it is not ground current then I would say heat as my next possible cause but since 1 went out then shortly after another went out tends to make me think there is ground current and it burned out the first Transformer or PHY breaking the current path then all the current flopped onto the second SFP module and quickly burned it out.
Another possible cause is STATIC charges where wind blowing over the tower generates static charges which follow the Ethernet cable to Earth ground through the switch.
Last year we modified our switch design so that it will attempt to pass ground current and or a STATIC discharge through and is theoretically capable of passing 10-20 Amps of current in an attempt to protect the Transformers and PHYs but the SFP module may not be able to carry that load and they are the bonded to the other device (router) through the Ethernet cables.
As I said we started repairing airFIBERS and what we find is once again the majority of damage to airFIBER radios is ground current.
It does not take much ground potential variance to cause ground current to route across delicate electronics. Could be as simple as a shorter AC power cord from the router to AC ground than the power cord on the switch. Remember current takes the shortest least resistive path, ground is not ground and potential is measured in Ohms. You could always use fiber modules in this installation.
But even though you feel it is not ground it could be and ground current can flow UP to tower if the Electrical service ground rods are inefficient and a negative charge is in the atmosphere and the service ground rods get wet in a rain and degrade then all the ground current in the building wants to flow up to the radio which is setting in a negative charged atmosphere which is what happens during storms.
I recently had a tower start taking damage and the solution was to add a NEW electrical service ground rod and bond to existing ground rods and clean up all the corrosion on the wires and rods.
As you guys know RF Armor recently started repairing airFIBER radios.
This past month we repaired 12 airFIBER 24 and 24HD radios and every single one was damaged from ground current. This is where the Ethernet transformer, or PHY is burned out.
Read this post on airFIBER repairs: viewtopic.php?f=21&t=2827#p19482
99% of damage to WISP equipment is from ground current.
But if it is not ground current then I would say heat as my next possible cause but since 1 went out then shortly after another went out tends to make me think there is ground current and it burned out the first Transformer or PHY breaking the current path then all the current flopped onto the second SFP module and quickly burned it out.
Another possible cause is STATIC charges where wind blowing over the tower generates static charges which follow the Ethernet cable to Earth ground through the switch.
Last year we modified our switch design so that it will attempt to pass ground current and or a STATIC discharge through and is theoretically capable of passing 10-20 Amps of current in an attempt to protect the Transformers and PHYs but the SFP module may not be able to carry that load and they are the bonded to the other device (router) through the Ethernet cables.
As I said we started repairing airFIBERS and what we find is once again the majority of damage to airFIBER radios is ground current.
It does not take much ground potential variance to cause ground current to route across delicate electronics. Could be as simple as a shorter AC power cord from the router to AC ground than the power cord on the switch. Remember current takes the shortest least resistive path, ground is not ground and potential is measured in Ohms. You could always use fiber modules in this installation.
But even though you feel it is not ground it could be and ground current can flow UP to tower if the Electrical service ground rods are inefficient and a negative charge is in the atmosphere and the service ground rods get wet in a rain and degrade then all the ground current in the building wants to flow up to the radio which is setting in a negative charged atmosphere which is what happens during storms.
I recently had a tower start taking damage and the solution was to add a NEW electrical service ground rod and bond to existing ground rods and clean up all the corrosion on the wires and rods.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
5 posts
Page 1 of 1
Who is online
Users browsing this forum: No registered users and 26 guests