I recently had an issue on a small daisy-chained network with three devices and STP enabled.
Topology:
Switch_01
STP MAC: CCCC
Prio: 32768
Port_25 connected over 10Gps to Port_09@Switch_02
Port_41 connected over 1Gbps to router
Multiple port connected over 1Gbps to clients
Switch_02
STP MAC: BBBB
Prio: 32768
Port_02 connected over 2.5Gbps to Port_01@AP/Switch_03
Port_09 connected over 10Gbps to Port_25@Switch_01
AP/Switch_03 (wireless Access Point)
STP MAC: AAAA
Prio: 28672
Port_01 connected over 2.5Gbps to Port_02@Switch_02
There is no redundancy in te network.
When AP/Switch_03 is off everything works fine and AP/Switch_03 will be the Root Bridge because it has a lower MAC Address (BBBB versus CCCC).
As soon as AP/Switch_03 is connected and booted it will become the Root Bridge (lower Prio). What happened was that Switch_02 made it’s Port_02 the Root Port.
Because of the higher Bridge ID for Switch_01 it blocked Port_09 to Switch_01
As a quick fix we disable STP but I could also lower the Prio on Switch_01 to make sure it becomes the Root Bridge.
My question is: As you mentioned in the training STP does not know the topology so is this the expected behaviour or should Switch_02 keep Port_02 open until it received a BPDU that comes from AP/Switch_03 via it’s uplink Port_09 what indicates there is a loop in the network?
Thanks for the detailed description of your topology and issue! That is indeed strange behavior. Off the bat, I can say that there should be no blocked ports on any switches if there is no physical L2 loop, regardless of which device becomes the root switch. Now if your Port9 on SW2 is becoming blocked, there may be any of the following issues:
There is an undetected physical loop. This is the most obvious one, but I assume this has been checked. I mention it here just for completion. I see that switch 3 is also an access point. Could it be that such an L2 loop is due to a wireless link in the topology? Just a thought.
Incorrect BPDU processing on Switch2. This can be due to a whole series of issues. What STP protocol are you running on all the devices? What cost scheme is being used, long mode or short mode? What vendor are these switches from? It could be that due to these issues or due to a manual cost configuration, a received BPDU on Port 9 of Switch2 is incorrectly processed, resulting in a blocked port.
A port will become blocked only if Switch1 sends a BPDU to Switch2, telling it that it has another path to the root bridge. But this is not the case, so there seems to be some misinterpretation of the BPDUs being sent. Also, port 25 on Switch 1 must by definition be a root port. The other end of a route port can never be a blocked port, it must be a designated port (i.e. forwarding).
Do you have any logs from the devices to see why this behavior is happening? Let us know so that we can help you further…
Thanks for your very helpful reaction.
That the switches should not block any port when there is no L2 loop is exactly what I thought and why I created this post.
In reaction on point 1.
I found no physical loop but I sometimes find devices in networks that do connect to both the wired and wires network while bridging them. Than you have a loop. Normal clients can have both active but they keep the seperated. I have seen this in the past f.i. with wired Sonos devices.
Switch 3 is indeed also a access point. It can also create a wireless backhaul for Mesh networks. The customer has five of these AP’s. But the port get’s blocked even when there is only one AP connected AND powered up. Also the backhaul is disabled on all AP’s.
In reaction on point 2.
So we are using Rapid STP on all switches but I can’t find info on the used cost scheme:
• Switch_01 is a Dell N1148T-ON STP Operation Mode Rapid STP
• Switch_02 is NetgearMS510TXUP STP Operation Mode RSTP
• AP/Switch_03 is a Netgear WAX630E STP Operation Mode RSTP
As we have disabled STP on Switch_02 and all AP’s I now see port41 (uplink to the Fortigate router) on Switch_01 is the root port.
I need to check the STP Priority setting on the router but have not found it yet.
For now I think it’s due to a incorrect BPDU processing on Switch_02!
So I will report it to Netgear.
A host that connects to both the wireless and wired networks should not create a loop. Hosts are not configured to route transient traffic, i.e., they don’t act like switches. So any BPDU that may be sent to such a host via the wired link will never be forwarded out of the wireless link or vice versa (unless the host is configured to simulate a switch).
If it is due to meshing, then you have eliminated this possibility since blocking still takes place with only one AP.
Rapid STP typically uses the long mode costs by definition, so if you’re using RSTP on every device, you should be OK.
I have a feeling that it has to do with the mix of vendors and how they implement STP. If you’re able to take a look at some logs, it may be enlightening to see why the switch goes into blocked mode. What kind of BPDU caused it? Logs from both the Dell and the Netgear (SW2) would be helpful. Let us know how your investigation is proceeding!
I have the log for the Netgear switch and it show me port 9 blocked.
For now we have a workarround and I have reported it to Netgear so for now this issue does not need any further investigation by me.