WLC Discovery Process & Selection

Hello, everyone.

I haven’t found a corresponding NW lesson to this, so I am creating a new forum thread.

3.3.c Describe access point discovery and join process (discovery algorithms, WLC selection process)

I have some questions regarding the WLC discovery process that the OCG mentions:

An AP can be “primed” with up to three controllers—a primary, a second-
ary, and a tertiary. These are stored in nonvolatile memory so that the AP can
remember them after a reboot or power failure. Otherwise, if an AP has previ-
ously joined with a controller, it should have stored up to 8 out of a list of
32 WLC addresses that it received from the last controller it joined. The AP
attempts to contact as many controllers as possible to build a list of candidates.

From what I understand, regardless of what we configure, the AP will attempt to discover as many WLCs as possible?

Is this primary/second/ter configuration done on the AP or the WLC? If these 3 controllers are configured and an AP reboots and discovers all the WLCs, are these the first 3 controllers that it will attempt to join? Also, what is “tertiary”? :smiley:

So does the WLC selection process work the following way?

  1. Try to join the primary WLC. If that fails, try the secondary one, and so on.
  2. If none of these 3 are configured, join the controller that you joined previously
  3. If you didn’t join any controller previously, join the least-loaded one.

Then again, this seems a bit different than what, for example Kevin Wallace says

He mentions something called a master controller?

My next question is

Otherwise, if an AP has previ-ously joined with a controller, it should have stored up to 8 out of a list of 32 WLC addresses that it received from the last controller it joined.

The controller tells the AP what other WLCs exist? A cisco doc says

Also, the LAP remembers the management IP address of its controller and the controllers present as mobility peers even across reboot. However, as soon as the AP joins another WLC, it only remembers the IP of that new WLC and its mobility peers and not the previous ones.

So if it does associate with a WLC, does the WLC also tell it what other WLCs there are?

It’s all getting randomly mixed up together and I unfortunately don’t get what happens when in which order.

Thank you.
David

Hello David

There are several mechanisms in place that aid an AP in finding and using available WLCs. First of all, we have the “primed” WLCs. These are the primary/secondary/tertiary WLCs. These are manually configured in the AP, and can be configured either using the CLI or he GUI from the active WLC. The term “primed” simply means that these configured WLCs will be used by the AP immediately, and if one of those fails, it switches over to the next WLC. More detailed info can be found here:

Yes, that is exactly correct. And the word “tertiary” simply means “third”, it’s the third WLC that it will attempt to connect to.

Other than the primary/secondary/tertiary WLCs which are manually configured, here is how an AP learns about the various other WLC addresses:

  • When an AP joins a WLC, the controller sends it a list of 32 WLCs (its own mobility peers).
  • The AP stores up to 8 of these WLCs (prioritizing primed controllers first), replacing older entries.
  • If the AP later joins a new WLC, it overwrites its stored list with the new controller’s mobility peers.
    • This explains why an AP “forgets” older WLC lists after joining a new controller.

Now the use of the term “master controller” likely pertains to mobility groups. In a mobility group, a Master Controller manages AP failovers and RF coordination. However, APs do not prioritize the Master Controller during initial discovery unless it is explicitly configured as their primary/secondary/tertiary WLC or appears in their cached list.

Yes. It is a bit confusing as there is different information found in different documents, but each one talks about a different stage in the process. The whole process is what I have listed above.

I hope this has been helpful!

Laz

Hello Laz.

That makes a lot of sense, thank you.

To verify, the AP finds as many WLCs as possible. Then, it will try them in the following order?

  1. Try the primary controller, then the secondary and then the third one.
  2. What happens here? Does the AP join the least-loaded controller or does it join a previously joined one first?

Also, please, when it comes to discovering the controller, is there any preference in what comes first? For ex: does a L2 broadcast come first, then DHCP, etc? This cisco doc says that it doesn’t:

And then my CBT Nuggets course is asking for its order.

The CBT Nuggets order is L2 broadcast, DHCP, DNS, and manual config. The OCG order is L2 broadcast, manual config, DHCP, DNS… it’s completely different in all 3 sources :smiley:

And the final question

After an AP has discovered, selected, and joined a controller, it must stay joined to that
controller to remain functional. Consider that a single controller might support as many as
1000 or even 6000 APs—enough to cover a very large building or an entire enterprise. If
something ever causes the controller to fail, a large number of APs will also fail. In the worst case, where a single controller carries the enterprise, the entire wireless network will become unavailable, which might be catastrophic.

Fortunately, a Cisco AP can discover multiple controllers—not just the one that it chooses
to join. If the joined controller becomes unavailable, the AP can simply select the next least- loaded controller and request to join it. That sounds simple, but it is not very deterministic. If a controller full of 1000 APs fails, all 1000 APs must detect the failure, discover other candidate controllers, and then select the least-loaded one to join. During that time, wireless clients can be left stranded with no connectivity. You might envision the controller failure as a commercial airline flight that has just been canceled; everyone who purchased a ticket suddenly joins a mad rush to find another flight out.

The most deterministic approach is to leverage the primary, secondary, and tertiary controller fields that every AP stores. If any of these fields are configured with a controller name or address, the AP knows which three controllers to try in sequence before resorting to a more generic search.

Technically, if the joined controller becomes unavailable, what exactly is the problem if the primary/sec/ter controllers aren’t configured? The APs still maintain a list of WLCs, up to 8 since the joined controller provides a list of WLCs in the same mobility group. So what’s wrong with just joining the least-loaded one? Or is the discovery processes reset for these controllers, or?

That’s all, thank you :slight_smile:
David

Hello David

The answers to these questions are quite complex, and depend on a lot of factors, so I’ll do my best to address them all…

If there are multiple active WLCs available, then one of the criteria that is used to determine which WLC will be chosen is indeed the least loaded controller. But this process is not so simple. How does the AP know which is the least loaded? The AP will first connect to its first choice WLC (either the previous one it was connected to or the primary/secondary/tertiary one). Then the WLC will send info about the state of the other WLCs in the mobility group, including load. At this point the AP may be directed to connect to a less loaded WLC. More on this process can be found at this Cisco Community Forum post. I believe that this level of detail however is beyond the scope of certification purposes.

But there is an important note to keep in mind for this and for your following questions. There are two processes involved here:

  1. WLC discovery - This is just getting a list of available WLCs.
  2. WLC selection - From the list of discovered WLCs, what is the order that an AP uses to choose the one to connect to?

So your above questions has to do with WLC selection. The algorithm is not simple in the sense that “here is a list, try these in this order”. If one of those WLCs in the list is too overloaded, there are mechanisms that will make an AP choose another WLC or even to actively switch over to another WLC .

One thing that clarifies this a bit is the fact that we’re talking about two different processes as I mentioned before. The Cisco documentation you shared indicates Wireless LAN controller discovery methods. This is the process by which an AP compiles a list of WLCs. There is no priority or precedence here. As soon as a source of info gives you a WLC address, it just adds it to the list. This is the WLC discovery process. There is no order here.

The WLC selection process however does have a precedence. The terminology used in the questions may be misleading. I believe the CBT Nuggets question should read “WLC selection process”.

So I guess the question remains, what source do you trust? When things like this happen, if you could lab this up and check it out, that would be the most authoritative source of information. However, if you can’t do that, then I believe the Cisco documentation takes precedence as far as real-world information is concerned. (i.e. info you need for deployment and troubleshooting of production networks). For certification purposes, I would tend to agree with the OCG. So there is a level of confusion here, but the issue is somewhat minor, so as far as certification goes, it’s not something to worry about too much.

If you depend only on the list of WLCs that each AP has compiled, when a WLC fails, the APs fallback behavior is reactive and non-deterministic, which introduces several risks and delays. You don’t know how each AP will choose the next WLC. If you manually configure the primary/secondary/tertiary on your APs, you can arrange them in such a way so that each one has a different order. What’s wrong with just joining the least loaded one? Well, if all APs do that simultaneously, the least loaded WLC will quickly become overloaded. Does that make sense?

I hope this has been helpful!

Laz

Hello.

Great, that makes sense. I have one final question. These prim/sec/ter controllers can be configured here on the WLC:

How is the configuration different here than if I was to do it in an AP Join Profile instead?

From what I understand, the first configuration is what the WLC presents during the discovery process while the second configuration is what the WLC tells the AP once it associates to it. In other words, if the WLC fails, the AP will switch to the configured backup primary/secondary controllers.

Thank you.
David

Hello David

The configuration differences between setting primary/secondary/tertiary controllers globally on the WLC versus in an AP join profile relate to scope, persistence, and discovery mechanics.

When configured globally in the WLC, the configured controllers apply to all APs connecting to that controller. The controller addresses are included in DHCP Option 43, DNS responses (CISCO-CAPWAP-CONTROLLER), and other discovery mechanisms. This is less granular and is used when most APs in a deployment should follow the same failover hierarchy. This initial list is retained until the AP successfully joins a controller, at which point they receive an updated list from their joined controller.

And this is where your first screenshot comes in. Here, you can configure the primary, secondary, and tertiary controllers on a per AP basis. Once an AP joins the controller, these settings are transferred to the AP, and defines the AP’s static preference list for controller failover.

This has the same result with the manual configuration in the Join profile which is the second screenshot you shared. So it’s just a different location from which to configure these parameters.

If both are configured, which one takes precedence? The WLC configuration takes precedence over the Join Profile. The join profile is applied only when the AP initially joins. If the WLC has different parameters to give the AP, those take precedence.

I hope this has been helpful!

Laz