Spanning-Tree LoopGuard and UDLD

lagapidis · June 12, 2019, 7:46am

Hello Marcelo

Remember that Alternate/Blocked ports do not send any BPDUs, but they do receive them and can process them. In this case if you get a unidirectional link failure as described in the lesson, the alternate port will stop receiving BPDUs, and will go into the forwarding state as a result.

I hope this has been helpful!

Laz

d_sevostyanov · September 5, 2019, 10:34am

But what about FlexLinks?

lagapidis · September 6, 2019, 4:35am

Hello Dmitriy

FlexLinks is another method of dealing with L2 loops. You can find out more information about it at this lesson:

I hope this has been helpful!

Laz

jonathon.m.harris · January 4, 2020, 6:14pm

Hi Rene,

Thanks for the great article. I keep finding conflicting information on UDLD Normal and Aggressive mode and I was hoping you or someone could give me a clear answer.

On this Cisco Community forum:

Peter states that both UDLD operation modes will disable the link. Here is a quote from his post - “So the difference between the normal and aggressive modes relates to the difference in handling an implicit uni-directional link event. If an uni-directional link is detected explicitly, the port will always be err-disabled, regardless of the normal/aggressive mode.” He cites U.S. Patent 6765877 and after briefly skimming it, he appears to be correct.

Then on this Cisco Document:

It lists the error conditions that will cause UDLD in Normal mode to err-disable a port. Does this only relate to Nexus devices? Meaning, UDLD Normal mode operates differently on Catalyst and Nexus devices?

I’m getting more and more confused reading about this topic and I’m hoping someone can clear the air for me.

lagapidis · January 8, 2020, 6:42am

Hello Jonathon

This is an interesting point that you bring up and it has given me (and I hope other readers) an opportunity to dig deeper and appreciate the intricate details that engineers have gone into in order to develop and implement these features.

I’d like to point out that according to Peter’s explanation (and the patent), there are two methods of detecting a unidirectional link: explicitly and implicitly. Explicit detection will result in the disabling of the port regardless of the mode of operation. Although Cisco documentation does not explicitly state this (pun intended), it seems to indicate this in this document:

Specifically it says:

If the port does not see its own device/port ID in the incoming UDLD packets for a specific duration of time, the link is considered unidirectional.

This echo-algorithm allows detection of these issues:

Link is up on both sides, however, packets are only received by one side.

Wiring mistakes when receive and transmit fibers are not connected to the same port on the remote side.

Once the unidirectional link is detected by UDLD, the respective port is disabled and this message is printed on the console:

UDLD-3-DISABLE: Unidirectional link detected on port 1/2. Port disabled

Port shutdown by UDLD remains disabled until it is manually reenabled, or until errdisable timeout expires (if configured).

All of the above is regardless of the mode of operation. This corresponds to the explicit method of detection.

In the following section of the same document, it describes implicit detection, which is where the mode of operation kicks in:

Here it states:

UDLD can operate in two modes: normal and aggressive.

In normal mode, if the link state of the port was determined to be bi-directional and the UDLD information times out, no action is taken by UDLD. The port state for UDLD is marked as undetermined. The port behaves according to its STP state.

In aggressive mode, if the link state of the port is determined to be bi-directional and the UDLD information times out while the link on the port is still up, UDLD tries to re-establish the state of the port. If not successful, the port is put into the errdisable state.

Aging of UDLD information happens when the port that runs UDLD does not receive UDLD packets from the neighbor port for duration of hold time. The hold time for the port is dictated by the remote port and depends on the message interval at the remote side. The shorter the message interval, the shorter the hold time and the faster the detection. Recent implementations of UDLD allow configuration of message interval.

The above statement only refers to a case where there is a timeout and UDLD info is not received. This describes the implicit detection, and this is where the mode of operation has meaning.

The documentation does not clearly state that the mode of operation plays no role when explicit detection is observed. However, in the document you shared above (Troubleshoot Uni-Directional Link Detection Errors on Nexus Switches), the examples stated correspond exactly with the patent.

Short of actually performing these tests in a lab, I believe that the Cisco documentation for both IOS and Nexus does agree with the patent, although I do believe that it could have been written more clearly.

I hope this has been helpful!

Laz

jonathon.m.harris · January 8, 2020, 4:59pm

Hello Lazaros,

Thank you for such a detailed and helpful reply! I agree and I’m just assuming the Cisco documentation is worded poorly and should state the method of detection when detecting a unidirectional link - implicit or explicit.

The main confusion for me came from several other resources stating that Normal mode would under no circumstances error disable the link.

However, now we both know that to be incorrect and it depends on whether UDLD implicitly or explicitly detects a unidirectional link.

Will Rene be updating the section to mention how implicit and explicit detection determines the action taken based on UDLD operating mode?

lagapidis · January 9, 2020, 6:25am

Hi Jonathon

I think the main confusion is the fact that Cisco doesn’t make a clear distinction between explicit and implicit detection, and it doesn’t even use that terminology.

I’ll let Rene know so he can take a look and see if he can modify the content to more clearly describe explicit and implicit detection.

Thanks again!

Laz

ReneMolenaar · January 27, 2020, 2:04pm

Hello Jonathon,

I saved this for now in my list of content to check/update. I’ll take a look and improve it.

Rene

quirik · August 29, 2022, 11:46am

Does this mean that blocked port will always go into forwarding state when they stop receiving BPDUs?

jisooya993 · August 30, 2022, 9:01am

Hi Rene,

I have read on Cisco data sheet that said UDLD normal also can put interface into error disable once it detected unidirectional traffic.

So, my question is Does UDLD normal type can take action when detect unidirectional packet?

Thank you.

lagapidis · August 31, 2022, 3:19pm

Hello Quirik

If you take a look at this lesson on Spanning Tree in general, you will see that a port enters the blocked state when the switches on each end of a link share their BPDUs. Both switches compare their bridge IDs, and whichever has a lower bridge ID will not block their port, and the one with the higher bridge ID will block their port.

This state of a blocked port is maintained as long as those BPDUs are being exchanged. Every time they are received, the priorities are compared.

If BPDUs stop being sent by one of the two switches, as is the case in this lesson with a unidirectional link, the other switch will have no incoming BPDU to compare bridge IDs. After a timeout, the port will begin entering the listening and learning states, and will eventually enter the forwarding state, resulting in an STP loop.

I hope this has been helpful!

Laz

lagapidis · September 2, 2022, 1:39pm

Hello Jisooya

Indeed, normal UDLD mode does create an errdisable on one end of the link, however, the other end is marked undetermined. This undetermined state does not change the the way the port operates, it is simply an informational state that can be shared via SMTP or other network monitoring protocols.

As the text in the link you shared states:

It continues to operate under its current STP status because this mode is informational only; it is potentially less disruptive although it does not prevent STP loops.

So normal mode will detect a unidirectional scenario, but will not prevent STP loops.

I hope this has been helpful!

Laz

castrojuanj · October 29, 2022, 4:59pm

Im reviewing this topic and i have a basic question
“Now something goes wrong…the transmit connector on SW2 towards SW3 was eaten by mice failed due to unknown reasons. As a result SW3 is not receiving any BPDUs from SW2 but it can still send traffic to SW2.”

About the above excerpt , my question is :

In my own experience, in a 2 fiber strands ethernet link, when one of the fiber strand is cut, it means Tx ----------------------------- X (cut) ------------------------ > Rx
Sw1 Sw2
Rx <------------------------------ OK ----------------------------- Tx

The fiber strand used for Tx from Sw1 to Rx to Sw2 is cut, then the port in Sw2 became Downs because it doesn’t receive any light from Sw1… So how could it be posible to have unidirectional link scenario in a 2-fiber strand eth link ?

For what i remember, there is a distinction for 1 Gbps links , it could be forced links or autonegotiated (for 10Gbps or above there is not autonegotiation), so for example :

Combinations i remember (one side auto, other side also in auto, both side becames up, on side in auto , but the other forced, only the forced side becames up)

So my other question relating the above is , and the following excerpt from this lesson :

“Now something goes wrong…the transmit connector on SW2 towards SW3 was eaten by mice failed due to unknown reasons. As a result SW3 is not receiving any BPDUs from SW2 but it can still send traffic to SW2.”

forced forced
Tx ----------------------------- X (cut) ------------------------ > Rx
Sw2 Sw3
Rx <------------------------------ OK ----------------------------- Tx

Port in Sw3 becames down due the fiber strand Tx from Sw2 to Rx Sw3 is cut (no light received in Sw3), so port in Sw3 becames down, and doesn’t receive bpdus from SW2, but despite port in Sw3 being state down it will send BPDU to Sw1 towards Sw2 ???

lagapidis · October 31, 2022, 7:32am

Hello Juan

That’s a good point. Take a look at this Cisco documentation that clarifies this situation:

It says that:

A unidirectional link occurs whenever traffic transmitted by the local device over a link is received by the neighbor but traffic transmitted from the neighbor is not received by the local device. If one of the fiber strands in a pair is disconnected, as long as autonegotiation is active, the link does not stay up. In this case, the logical link is undetermined, and UDLD does not take any action. If both fibers are working normally at Layer 1, then UDLD at Layer 2 determines whether those fibers are connected correctly and whether traffic is flowing bidirectionally between the correct neighbors. This check cannot be performed by autonegotiation, because autonegotiation operates at Layer 1.

So if autonegotation is enabled (which it is by default) then if one fiber strand is cut, the link will go down. This is the check at Layer 1, and UDLD is not involved in this. However, if it is not enabled, then you will require UDLD to determine that this is a unidirectional link.

The scenario of a cut fiber with autonegotiation disabled is not very common in production networks. However, when using other technologies such as satellite links which don’t have autonegotiation, UDLD becomes much more useful. Take a look at this lesson which includes a topology where UDLD is necessary.

I hope this has been helpful!

Laz

gurowar · January 30, 2024, 10:15pm

Sorry guys I confused myself here, I have a hub and spoke topology here

                                         Sw-A
                                           |
                                        Metro-E
                                       /       \
                                   Sw-B         Sw-C

The connection s a Metro-E that connects all 3 sites together. I have a vlan 1925 that resides in all 3 sites. The subnet for vlan 1925 is 192.168.3.XX.

Sw-A - 192.168.3.2
Sw-B - 192.168.3.3
Sw-C - 192.168.3.4

I am running OSPF for the 192.168.3.0 subnet

The interfaces that connect SW-A to SW-B and SW-A to SW-C are trunk ports
So with this set up I really shouldn’t have any spanning tree issues but for the past 2 days I have noticed for a brie 2 mins and it happens at about 1 am in the morning that OSPF goes down I I see

SPANTREE-2-LOOPGUARD_BLOCK: Loop guard blocking port te1/1/1

this is the physical interface to the Meto-E. So after further investigation I believe what is causing the issue is the way it is set up.

I believe SW-A should be the root bridge but in reality it is actually SW-C. SW-A is our main site and both B and C are remote locations which connect into A for all its compute. I also noticed that in SW-A location vlan 1 is enabled but nothing is configured and because te1/1/1 is a trunk port that is also contributing to the issue. I believe vlan 1 should be shutdown as its not in use.

So to fix my nightly SPANTREE-2-LOOPGUARD_BLOCK: Loop guard blocking port I would have to make SW-A root and does it matter if vlan 1 is shutdown or not? Any suggestion would be greatly appreciated!! It’s just weird I have been here for 4 months now and this just started happening Monday.

Thank you in advance!!!

lagapidis · February 1, 2024, 8:30am

Hello Warren

Based on the SYSLOG message you shared, it seems that Loopguard is kicking in. When enabled, Loopguard keeps a port in a loop-inconsistent state if it stops receiving BPDUs. This state effectively keeps the port in a blocking mode, even though it’s not receiving BPDUs. If BPDUs are received again on the port, Loop Guard returns the port to its original STP state. For some reason, BPDUs stop being received from either SWB or SWC.

Before I go on, I am not completely clear as to the connectivity between the switches. You mention a Hub and Spoke type topology. Does that mean that SWB and SWC communicate via SWA? And SWA is connected only via a single interface to the Metro Ethernet network? And all three switches are on the same VLAN? I don’t see how a hub and spoke topology can take place between three switches on the same Layer 2 segment. Can you explain your topology and connectivity more fully?

Also, is there any indication of which VLAN the Loopguard issue is occurring on? It may not be 1925 so we don’t know what part of the topology may be causing it. If you can find this out also, it would be helpful.

Let us know these details so that we can continue to help you in the troubleshooting process.

I hope this has been helpful!

Laz

gurowar · March 21, 2024, 8:17pm

Hi Laz,

Sorry for the delay:
Thank you sir! so what i found out is that SW-C is the master so I am thinking SW-A should be the master and that would solve the issue. Yes this is a Hub and Spoke so I don’t believe the Spoke should the master or am I wrong here?

lagapidis · March 23, 2024, 6:38am

Hello Warren

I believe that it would make sense to make SW-A the root bridge in this particular topology. I am still not clear as to the reasons behind the triggering of the loopguard feature, however try it out and let us know your results.

I hope this has been helpful!

Laz

gurowar · March 25, 2024, 4:39pm

Hi Laz,

Ok, sounds good I will change to SW-A and see what happens. I’m not sure what is cause the loopguard, its a hub and spoke so I don’t even know why we have loopguard. I am thinking of disabling it and see what happens. I might do that when I change the root bridge to SW-A or maybe I will wait first and if it continues then disable loopguard. I will let you all know what happens.

Thank you !!!

bigtiggur · May 14, 2024, 2:47pm

If loopguard kicks in when the non designated port stops receiving BPDUs, how does the router differentiate between a loopguard scenario and when the root bridge has crashed and is no longer there?