VXLAN MP-BGP EVPN L3 VNI

The 9000v images are one of the few that you can download without a contract.

https://software.cisco.com/download/home/286312239/type/282088129/release/10.4(5)?i=!pp

Hi,

For L3 VNI or L2 VNI, if a host in VLAN 10 needs to connect to an external network or a host in a different location, such as a branch, what would the topology and traffic flow look like?

Should we create an interface VLAN on each switch and enable OSPF?
How would the Leafs communicate with the border leaf or spine in this case?

Hello Diyaa

If a host on VLAN 10 needs to connect to an external network or a host in a different location, you would simply have to route traffic to the subnet that you want to route traffic to. In the example in the lesson, S1 in VLAN 10 is trying to reach a subnet external to its own, (i.e. VLAN 20 where S2 is). The traffic flow would look the same as what is in the lesson, but instead of VLAN 20, you would have what is known as a border leaf switch. This is a VXLAN switch that acts as a gateway between the VXLAN overlay and external networks. The logic is the same as in the lesson, but instead of S2, you would have some network device that connects to a network outside of the VXLAN domain. Does that make sense?

I hope this has been helpful!

Laz

Thanks for your reply.

Just to clarify, let’s say Leaf-1 acts as a Border Leaf switch and has a link to Router-A. Router-A has a default route, and we need to redistribute this route to Leaf-1 and Leaf-2.

Should Router-A be part of VRF CUSTOMER, or does it have to be part of the default VRF?

Hello Diyaa

Yes, exactly, that’s the idea. You must somehow advertise that route into the fabric so you can reach destinations outside of your network.

That depends upon the intended scope of the default route and the design requirements. If the default route is only for the specific tenant that is served by that VRF, then it can be in the same VRF. However, if its purpose is to provide connectivity to multiple VRFs, then it would be preferable to put it in the global routing table (which is the default VRF).

I hope this has been helpful!

Laz

1 Like

Hi,
Can you please explain the packet flow of inter-vlan / inter-subnet host communication in flood and learn mechanism?

Hello Rahul

Let’s start off with a high level overview of “inter-subnet” communication in a flood-and-learn–based VXLAN environment. Although modern VXLAN deployments typically use MP-BGP EVPN for more efficient control-plane learning, understanding the original “flood and learn” approach helps clarify the core forwarding concepts.

Inter-subnet routing in VXLAN involves two big steps:

  1. Traffic is routed from the source VLAN to a Layer 3 interface (the default gateway), typically on the local VTEP, but not always.
  2. The packet is VXLAN-encapsulated with the L3 VNI (VRF context) or re-injected into another L2 VNI, depending on the design. It then traverses the underlay network to the remote VTEP that hosts the destination subnet, and finally is switched out toward the target host’s VLAN.

Let’s dig deeper and take a look at some more detailed steps of this process:

Let’s assume we have two hosts, Host A (IP in VLAN 10) and Host B (IP in VLAN 20). VLAN 10 and VLAN 20 each map to different VXLAN Layer 2 VNIs, and there is typically one L3 VNI (VRF) for routing.

ARP Resolution (Initial Steps)

  1. Host A needs to communicate with Host B, which is in a different subnet. Host A knows that the destination is not in its local subnet, so it first resolves its default gateway MAC address. If Host A’s ARP cache is empty, it sends an ARP request for the default gateway IP.
  2. Because this is broadcast traffic, the local VTEP floods the ARP request over VLAN 10 locally. In a flood-and-learn VXLAN design, unknown or broadcast traffic can also be sent to other VTEPs in the same L2 VNI via multicast (or head-end replication, depending on configuration). However, for the default gateway, the local VTEP itself typically responds (via the integrated Anycast Gateway or SVI).
  3. The local VTEP replies to Host A with the gateway MAC address. Host A now knows how to reach the default gateway.

Routing at the Source VTEP

  1. Host A sends the Layer 2 frame to the default gateway MAC, which is on the local VTEP. The VTEP performs an L3 lookup (the destination IP is in a different subnet) and makes a routing decision to forward from VLAN 10’s IP interface to VLAN 20’s IP interface inside its VRF.
  2. After this routing function, the local VTEP encapsulates the traffic into a VXLAN packet. Because the traffic is bound for a different VLAN, it uses the L3 VNI (or re-injects into the remote VLAN’s L2 VNI after the route). For a simple flood-and-learn design, the default VTEP for VLAN 20 might be determined by how MAC addresses were learned previously or by flooding BUM traffic.

Traversal in the Underlay

  1. The VXLAN-encapsulated packet (outer IP/UDP header + VXLAN header) traverses the underlay network. Multicast groups or unicast replication might be used if the VTEP doesn’t have an explicit remote mapping. In a pure flood-and-learn model, unknown unicast traffic (if the host’s location is unknown) could also be flooded.

Decapsulation and Delivery at the Remote VTEP

  1. The VTEP that owns VLAN 20 for Host B receives the encapsulated packet, decapsulates it, and performs the routing/forwarding decision for VLAN 20 locally. It sees that Host B’s MAC/IP is local, does an ARP or MAC lookup, and forwards the packet out of VLAN 20.

  2. Host B receives the traffic.

Response Traffic from Host B to Host A

  1. Host B, in VLAN 20, needs to send a response to Host A in VLAN 10. The same steps happen in reverse: ARP for the gateway if needed, local route at the remote VTEP (Host B’s leaf), VXLAN encapsulation with the same or different VTEP destination, and traversal back to the local VTEP hosting VLAN 10.

I hope this has been helpful!

Laz

”This however requires that you have set up BGP EVPN.”

As we have already configured BGP EVPN in this lesson surely just adding ingress-replication protocol bgp and removing the multicast config mcast-group 239.1.1.1 would fix this alone?

Just making sure i’ve got this correct

Cheers

Rich

Hello Rjbotham

You are correct, simply taking the steps you suggest would resolve the issue. My statement before comes as part of a longer thread and conversation with Aqueel that starts here. Because the conversation was being steered away from the scenario in the lesson, I was just confirming that BGP EVPN is still required for what was geing discussed. If you read the whole thread, you’ll get the gist of it.

I hope this has been helpful!

Laz

1 Like

In the Final configs it included the mcast group 239.1.1.1 on the nve interfaces of the leafs but i dont think it was mentioned in the video. My NVE peers wouldnt come up without this in my lab. Feature PIM was left disabled in this lesson and my lab. Why did I need to add the mcast group to the NVE, in additon not enable pim?

Hello Rafa

Good catch! That’s an important part of this particular configuration. Strictly speaking, as stated in the lesson:

Technically, we don’t need multicast in the underlay network because we’ll only have inter-VNI traffic. We don’t have intra-VNI traffic between hosts within an L2 VNI. Because of this, I didn’t configure PIM on the switches.

So Rene followed this approach in the lesson. Indeed, the multicast commands in the lesson and in the final configs under the NVE interfaces were unnecessary. Now Rene went over this lesson again with me, and it turns out that the video configuration is correct and it works. You can get the required configs here:

Now I don’t know why in your particular situation the configuration didn’t work until you put in those commands. You need something to deal with BUM traffic. In the video, Rene included the host-reachability command under the NVE, so that may have resolved the issue. You may not have had that, and the multicast command resolved it. In any case, all of this info gives you more direction in your troubleshooting process. Let us know how you get along!

I hope this has been helpful!

Laz

Hello.

Thanks for the lesson.

I wanted to ask something from my ignorance about the limitations of MP-BGP…why is not possible to advertise via MP-BGP an IP-MAC mapping ? I was thinking that if possible, it will eliminate the need of having to configure mulitcast flood and learn for unknown unicast traffic by having the leaf swicthes replying to ARP Requests as a proxy. ARP Suppresion still has the inconvinient that when a downstream host at the very first time sends an ARP Request for a remote host in the same VNI then we still need multicast to make the ARP request get to the remote VTEP. Why not learning both IP and MAC via MP-BGP and just have Multicast on the underlay for broadcasts and multicasts?

Thanks,

Jose

Hello Jose

Actually, MP-BGP EVPN does exactly what you’re asking for. It advertises IP-MAC mappings, and this is one of the fundamental features that makes EVPN so powerful. EVPN Type 2 routes are specifically designed to carry both MAC addresses AND their associated IP addresses in BGP updates.

However, you correctly noted that “at the very first time… we still need multicast.” This is not because of a limitation of MP-BGP EVPN, but it is a characteristic of what is known as the “silent host problem”.

If Host A connects to the network but remains completely silent (sends no traffic, no GARP), the local VTEP has no way of learning it. Since the VTEP doesn’t know Host A exists, it cannot advertise a Type 2 route. So when Host B tries to ARP for Host A, Leaf B checks its BGP table but finds no entry. So then Leaf B must flood that ARP request (via multicast or ingress replication) to discover the silent host.

Once Host A replies, the local VTEP learns it, advertises the Type 2 route, and all subsequent ARPs are suppressed.

But the silent host problem is not the only reason to provision for BUM traffic. You must still provision for true broadcast traffic used by protocols such as DHCP, and some NetBIOS traffic, as well as for true multicast traffic, including streaming protocols, and some routing protocols.

Now you can eliminate underlay multicast entirely by using ingress replication (head-end replication), where the ingress VTEP unicasts copies to all remote VTEPs. The replication list is built automatically via EVPN Type 3 (Inclusive Multicast Ethernet Tag) routes, but it is not very scalable, and should be used only for a limited number of implementations.

So ultimately, the limitation is not in MP-BGP or EVPN, they already carry IP-MAC mappings. The limitation is that you can only advertise endpoints that have been learned; if a host is completely silent, there’s no authoritative source to originate its mapping until someone tries to discover it.

I hope this has been helpful!

Laz

1 Like

Thanks Laz. You are right and made me understand now… “The limitation is that you can only advertise endpoints that have been learned; if a host is completely silent, there’s no authoritative source to originate its mapping until someone tries to discover it.”

I think I did not express my question right the first time, sorry. I had the doubt on this lesson because Rene creates and advertises on the VRF Address Family the downstream networks (on LEAF switches). Why? I get that it is needed to work, but why is the Type-2 route not “enough” for the LEAF to be able to use such route for both L2 forwarding and L3 routing (this is, use the same route for the MAC lookup or for the IP lookup). Why do we need to have different route types for each purpose within MP-BGP?

Thanks