My understanding is that if there are more than one Mapping Agent on the network, the effect on the source or recipient routers would be cumulative - thats mapping table, groups to RPs, will be added on the source and recipient routers so they will have a map of different groups on different RPs. (Though seems in some cases, like NX OS, it may not be correct).
But what would happen if somebody add new Mapping agent (and RP) on the network (by mistake or maliciously) pointing to RP ‘2’ that will advertise the same group as coming on RP ‘1’ ? So client or source router now has two records for the same group - one saying the RP’1’ is authoritative for it and the other that its RP’2’. What RP the source router will register with and which RP the client router would join? I see a potential problem here, like DOS attack.
In a worse case, if only one Mapping agent will be listened to (again, seems at least NX is like that), what are the ways that would help mitigate potential issue when a rogue Mapping agent and RP are added to the network?
Yes, that is correct. The process of choosing the RP is something that is done independently for each group. Also, each individual mapping agent independently and individually makes these decisions regardless of what other mapping agents do. As stated in this Cisco documentation:
The mapping agent receives announcements of intention to become the RP from Candidate-RPs. The mapping agent then announces the winner of the RP election. This announcement is made independently of the decisions by the other mapping agents.
In addition, when there are multiple mapping agents, conflicts are resolved following these rules:
If there are two announcements with the same group range but different RPs, the mapping agent will select the announcement with the highest RP IP address.
If there are two announcements where one group is a subset of another but the RPs are different, both will be sent.
All other announcements are grouped together without any conflict resolution.
Now mapping agents send out their conclusions based on these rules to the 188.8.131.52 multicast group, to which all regular routers join. Based on this content, each router is responsible for populating its own Auto-RP cache with the Group to RP mappings. The cache contains both negative and positive entries.
When looking for an RP for a particular group, the Auto-RP algorithm will look through the negative entries. If there is a match to a negative entry, no RP is used, and the group is considered to operate in dense mode. Any RP information in the negative entries will be ignored since RPs are not used in dense mode.
If the group does not match a negative entry, the algorithm will begin to search in the positive list.
Now in this list, because each group corresponds to a particulate RP, there may be conflicts where multiple RPs try to serve overlapping group ranges. The receiving router uses the longest match rule to resolve all conflicts. If there are multiple matches, only the one with the longest prefix length is selected.
So you see, any conflicts, even due to malicious misinformation given by a rogue mapping agent, are ultimately resolved within each multicast router with very specific criteria.
Thank you, that helps clarify the issue. It seems that there is a potential for somebody to configure RP on the network advertising some overlapping groups and give it highest IP of all the RPs. But that should not be a problem - so new RP will be an RP now, not a big deal since the traffic to RP is very small (clients switch to source trees as soon as got first packet from RP) and so location of RP is not hugely important. One way or the other the traffic will be established between source and recipient.
That’s a great observation. Indeed it seems that R3 should not have an mroute to 184.108.40.206 since mapping agents use this address to listen for candidate announcements. It makes sense for R2 to have it, but what about R3?
Well, it turns out that the mapping agent (R2) will send out IGMP membership reports for the 220.127.116.11 group as soon as the send-rp-discovery command is applied.
This is confirmed with the following command on R3:
If you enable debugs you will see the relevant membership report as well. This is why the group appears in the mroute table of R3, not because R3 received an RP announcement, but because R2 sent an IGMP membership report.
After reading all the lessons and having some very helpful discussion, it all was crystal clear to me. And then my pesky brain started to process it all again and looked at it from a different, more practical perspective. And it asked me a question I am struggling now to answer. Seems after all my ‘crystal clear understanding’ was more of an illusion . Happens quiet often. So I went for second reading.
The question is - how often in real life we really need to deal with Auto-rp and/or bsr? Regardless how often it is getting set up (in other words even, how often the set up auto-rp was really needed?). Here is what I mean, for a large networks (for small ones its not really an issue either way):
Traffic to RP is really very small in case of (hopefully) sparse mode - just a couple of packets from receivers and a small stream of register packets from sources. Once join report is submitted and receiver got one packet of multicast traffic it switches to source tree that no longer involves RP and uses the shortest path to the source. Conclusion is we dont need high number of RPs - maybe just one per region/site.
Setting up multiple Auto-rp is a pain by itself, particularly if they are to service different groups - how to manage all these records on large network with multiple Auto-rps?
Then there is also set up of Agents - these may or may not be on the same routers as RPs. Additional configuration.
Then all the routers running IOS and XE need to be setup as ‘receivers’ - if it is missed we are in trouble. And if we have non-cisco hardware we are in trouble too if using auto-rp.
We need to be careful about the IPs or RP and Agents - they highest ones would win selection, particularly for NX platforms which can only process a single list of groups from Agent with highest IP on the network.
Auto-rp and Agents constantly flood network with announcements. However small, it is still a constant flow of traffic from multiple RPs/Agents.
Troubleshooting multicast issues involve also troubleshooting auto-rp/bsr issues which could be more complicated than multicast data traffic itself.
Does not look very nice. And alternative approach is to use static RP with minimal number or RPs, and I can not find the reason why it would not be better in 99% of cases and what is the reason we might want to deal with Auto-rp or BSR.
Every router on the network needs to be configured with several commands for multicast, auto-rp or not. Its not worse to set router with static IP for RP than as a receiver for Auto-rp traffic.
A single RP at particular site can handle thousands of receivers (it only need to reflect a couple of packets for each).
No dealings with AUto-rp protocol issues, like routing and additional configuration, etc.
Single IP can be used for RPs in different sites for redundancy and for to avoid reaching RPs across sites (with anycast and msdp)
Easy to maintain list of groups - all the RPs have the same list.
Generally, RPs is something that should never change, there is little reason for that. But if that happens then its easier to send a single command to change IP of RPs to all the routers on the network with a simple script than to deal with potential issues with Auto-rps when changing its IPs (IP rank, routing problems, firewall filtering, etc.).
No problems with different from cisco vendors - as long they can run multicast they should understand what RP is and able to use it.
Icing on the cake - no flood of traffic from RP and Agents announcements and no dealing with figuring out what group is serviced by what RP.
So it seems minimizing a number of RPs to literally a single anycast IP and go with static RP makes sense while there are no obvious reasons to deal with Auto-RPs or BSR and generally with multiple RPs.
Is it incorrect to think so?
I can understand your train of thought here, and there are some valid points. However, you must keep in mind that the primary benefit of Auto-RP is flexibility, scalability, and simplified configuration. For networks of, say, 10 or 15 routers (which may sound large), you may still be able to handle the changes needed for any multicast modifications, but there are corporations and enterprises with hundreds or even thousands of routers in networks that span whole countries and continents. I don’t believe your train of thought would satisfy the complexity of such networks.
The issue is not a matter of traffic, or of configuration complexity, or of troubleshooting. The primary benefits of Auto-RP as opposed to manual configuration include:
extreme scalability - Auto-RP is designed to scale to support large networks with many multicast groups and RPs. It allows you to add or remove RPs without the need to manually reconfigure the multicast routing protocol. This can be especially beneficial in networks that are constantly growing or changing.
simplified configuration - even though you mention manual configuration may be simple, there are networks that change often and need to be able to automatically adapt with these changes rather than creating a script or planning manual configuration changes that are prone to errors for every modification needed.
fault tolerance - this is an area that manual configuration cannot resolve. Not only is Auto-RP fault tolerant in that you can have multiple RPs, but the actual discovery process itself is fault tolerant. If an RP fails, Auto-RP can automatically detect the failure and elect a new RP. This can help ensure that multicast traffic continues to flow without interruption.
Security - this is another area where manual configuration falls short. Auto-RP includes security mechanisms to help prevent unauthorized hosts from becoming RPs or injecting false RP advertisements into the network. This can help improve the security of the multicast network and prevent network disruptions caused by malicious actors.
Now keep in mind that there is no right or wrong answer here. Each administrator chooses the solution that best fits the needs of the particular network. And for some networks, you’re right, a manual configuration may be preferable. However, there are situations where Auto-RP, as well as BSR, are preferrable, and I just wanted to share some of the reasons for that.
Thank you, Laz, that helps to understand why Auto-RP exists.
Saying that, it also kind of confirms my train of thought. It actually comes from my current experience with very large network where Auto-RP is exactly what is causing sufficient problems to start rethinking that design. And that train of thought came from addressing some of the points you highlighted with the following counter-arguments:
Scalability - thats the main question. And it seems static RP might win here. The volume of traffic through RP is very small (in case of sparse mode). So there seems no particular reason to create multiple RPs for multiple groups to reduce a load on each RP router. If thats the case, then minimizing number of RPs is a way to go - having multiple Auto-RPs, each with its own set of groups, corresponding agents for them, with all advertising themselves incessantly, tracking all the potential changes for this or that group mapping to this or that RP does not look like a sweet dream for NetOps. Contrary to that, minimizing RPs to just several, all having the same set of groups, looks very appealing from management point of view. Plus, no announcements to deal with. So, unless we are concerned with amount of traffic that has to pass through RP, it seems static RP is much simpler and ‘cleaner’ to run. Changing static RP may seem like more difficult than changing Auto-RP, however it should be a very rare case (like re-IP the whole network?) . I shudder to think of any sufficiently large network that needs to change its IP scheme entirely very often. But even that in our days of SDN and automation buzz should not be an issue. If it is, Im afraid dealing with changing multiple Auto-Rps with all their groups would bring about enough pain for IT proprietors to think hard about that. So the only question really is ability of single RP to sustain traffic for multiple clients.
Configuration Simplicity - again, script that contains really one single command for RP and that targets a list of all routers on the network (and such list is a must for any, particularly large network) is simple enough and does not have much options for an error (possibly missing couple of routers?). And any network that is changing their RPs often probably have more trouble than just RPs. And only the clients need to be configured, nothing on RP themselves. Which makes things a little simpler, particularly considering the basis for the whole argument - not having multiple RPs on the network dedicated to various groups (thats the only argument I see in favor then to resort to Auto-rps - when one decides to have many different RPs then Auto-rp is a necessity). The point here is that all (or almost all) clients routers are configured with the same single RP address. Much simpler.
Fault tolerance - it seems its the same as for Auto-rp. Thats configure a score of RPs as anycast with the same IP and tie them up by MSDP. One of the following lessons does exactly that.
Security - I dont see where static RP have problems with that. We still can have ACL which define what group we are advertising and can configure borders to prevent multicast in or out of the network. Auto-rp need to be really protected from some rogue Auto-RP taking over and it maybe in some cases virtually impossible (like installing on the network Auto-RP and Agent with IP higher than any other - at the very least all Nexus based routers will only listen to one single highest IP). Not an issue with static RP - it is what is configured on the client and no other RP screaming ‘you have to listen to me’ would make any difference. One may be able to reset a couple of client routers to wrong RP address but thats all there is, contrary to Auto_rp where single wrong one may affect the whole network.
The point here is that Im not just arguing what design is better in particular case but rather try to find a fault points with minimized number of static RP design. In the past Auto-rp could be an important option for some cases (though even then, if network is well managed…) but today, with shift to and available means of SDN and SDN-like capabilities, it seems may only be an option for a very rare and odd cases.
But that, of course, is only valid if Im correct to assume that single RP passes only very small traffic and can support thousands of clients without being overloaded.
I appreciate your thoroughness in describing your understanding. The RP will indeed route only the first few multicast packets of a particular communication. However, this does not mean that it will remain relatively dormant. An RP will use a significant amount of its CPU and memory resources in especially large networks.
Of course the amount of resources consumed by an RP depends on various factors, such as the number of multicast groups, the number of multicast sources, the frequency of multicast traffic, the size of the multicast routing table, and the capacity of the RP router itself.
In general, a large multicast deployment with a significant number of multicast groups and sources can require a significant amount of CPU resources for the RP router to handle the control plane operations associated with maintaining the multicast forwarding state. This amount can increase if the RP router is configured to support complex multicast routing protocols or features, such as Bidirectional PIM or Source-Specific Multicast as well.
So the issue of scalability and dynamic fault tolerance in a network where multicast is vital is still in favor of an Auto-RP deployment.
Can you create a topology with a statically assigned RP? Yes, absolutely, and if you take the precautions you mention, it may function just great. However, I am still on the side of Auto-RP for larger networks simply because the engineers who designed recommend it, as does Cisco itself, and they’ve all done their due diligence to ensure that it is indeed more suitable for such scenarios…
Thank you for the thought experiment however, I have found it very useful, and I hope you have found our discussion helpful as well!
That was very useful discussion. It seems the most important factor to consider is the amount or resources consumed by RP, notwithstanding the expectedly low traffic through it. The other factor is administration (seems originally the primary factor in using Auto-RP or BSR) but, again, in today’s SDN oriented environment I don’t see it as important as it used to be. If anything, its administration overhead of dealing with either of RP protocols, that motivated me to consider options.
I will do some experimenting in order to determine a load of RP function on the router but still would be surprised to see it so significant as to demand multiple RPs dedicated to different groups. I would expect half a dozen dedicated RP routers to be able to support hundred or so sources and thousands clients with up to two dozen groups. The relief of not running additional protocols (which itself saves some resources), nor troubleshooting them, nor keeping track of which RPs runs what groups and not dealing with configuration and mis-configuration of RP protocols (just one misconfigured RP router has a potential to overtake ‘legitimate’ auto-rp and damage the whole mcast network, contrary to static RPs where misconfigured router can only hurt itself) in my view should justify the static RPs concept.
The fact that so many engineers still stick with Auto-rp or BSR sure makes me mindful of possible pitfall with resources but I also can easily advocate it to a certain inertia of thinking. See it all the time even in case of highly skilled and experienced engineers. So will need to test in real environment. I feel sufficiently motivated by the idea to try it out.
I’m glad the discussion was useful. It was useful for me as well.
Yes, I believe that the only way to determine what is best for any particular topology is to experiment with the implementations and examine the impact on the resources of the RP. I look forward to hearing about the results of your experimentation. Keep us posted!
What if I wanted to get my device (listener) to igmp-join a specific group say 18.104.22.168, will auto-rp automatically detect ? Do I still need to configure the igmp join request on the closes router to the device?
The Auto-RP feature simply automates the selection of the RP in a multicast topology. It does not affect the way that a particular host chooses to join a multicast group. Therefore, to answer your question directly, your listener must still issue the igmp-join command to join a particular multicast group. Does that make sense?
R1 is the RP, R6 is the MA. R6’s directly connected PIM neighbors (R2, R5 & R7) and they are receiving the MA (22.214.171.124) msg telling them who is the RP (126.96.36.199) but as expected it’s not the same for R3, R4 & R8.
Then, i’ve issued the ip pim autorp listener only on R2 to test out if R2 forwards the MA 188.8.131.52 towards R3 and it worked as expected (R3 has the info of the RP 184.108.40.206)
Then, i’ve issued no ip pim autorp listener on R2, waited a couple of mins, and then tun0 gone down on R3 therefore it lost the info for the RP as expected.
Later, i’ve swaped R2 ↔ R3 wan ip add (R2 now has 192.168.23.3 and R3 192.168.23.2 making the R2 the PIM neigh DR for this 192.168.23.0/24 subnet) , and later R3 tun0 gone up and has again the info of RP 220.127.116.11 without needing the ip autorp listener on R2.
Thanks for the detailed topology and test. It looks like you confirmed this behavior. Since R2 became the DR between R2 and R3, it was able to forward the mapping advertisements to R3 without implementing the ip autorp listener command on R2.