VXLAN MP-BGP EVPN L2 VNI

Hello Nicolas

Let me chime in on this conversation as well, it sounds very interesting!

Your approach seems to be a good one, considering you don’t have a traditional WAN solution like MPLS. Using DWDM Point-to-Point links for connecting different DC borders is a great way to ensure high-speed, low-latency connections. Since DWDM operates at the physical layer, it provides a transparent, “protocol-agnostic” and low-latency transport, which is ideal for data center interconnectivity.

Layer 2 VNI over static ingress replication may not be the best choice for multicast traffic because it sends a copy of the traffic to each VTEP that is configured with that VNI. If your network is small, it is probably best for simplicity, but as you get bigger, scalability will be an issue.

Establishing BGP on top of that for control-plane traffic is also a good decision. BGP is a robust, scalable protocol that can handle a large number of routes and is capable of policy-based routing, making it ideal for control-plane traffic management.

Forming VTEP peering on top of that for data plane traffic using L2VPN can also work well. This setup allows for the decoupling of the physical network (underlay) from the virtual network (overlay), providing flexibility and simplifying the network architecture.

Overall, this setup should be able to provide a robust, scalable, and flexible inter-DC solution. However keep in mind that our discussion is a very high-level and hypothetical one. It’s important to remember that every network is unique and the requirements of each will vary, so it’s always a good idea to thoroughly test any suggested setup in a lab environment before deploying it in a production environment.

I hope this has been helpful!

Laz

1 Like

Hello.

Thanks for the lesson. I am not understanding the benefits of this, I was going to ask for some help understanding them.

When HOST1 tries to reach HOST2 for the first time, it sends an ARP Request broadcast message as expected. Now, when the broadcast gets to LEAF1, I am assuming that thanks to having configured Flood and Learn Multicast then it ends up making it to LEAF2 VTEP… is this correct? Not thanks to BGP right? At least that is what it looks like from my packet captures…

image

But then, what benefit this provides if for every remote destination not cached in the ARP table of HOST1 we are still going to be using multicast flood and learn, since the ARP Requests are broadcasted? I might not be seeing something….

Thanks,

Jose

Hello Jose

You’re right, when HOST1 sends that first ARP Request for HOST2, the broadcast IS being flooded via the multicast group (or ingress replication if configured that way), NOT directly via BGP. BGP EVPN is a control-plane protocol, and as such, it doesn’t carry the actual ARP packet payload. The flooding you see in your packet capture is handled by the VXLAN data plane using whatever BUM replication method you’ve configured.

So What’s the Benefit of EVPN? The key benefit is when ARP Suppression (Proxy ARP) is applied. Here’s how the complete workflow actually works:

Phase 1: Initial Discovery (What You Saw)

  • HOST1 needs to reach HOST2 but has no ARP entry
  • HOST1 sends an ARP Request (broadcast)
  • LEAF1 receives the broadcast and checks its EVPN database
  • LEAF1 has no MAC/IP binding for HOST2 yet (HOST2 is a “silent host” that hasn’t communicated)
  • LEAF1 must flood the ARP Request to all VTEPs in that L2VNI using:
    • Multicast (to the group associated with the VNI), OR
    • Ingress replication (unicast copies to each remote VTEP learned via BGP EVPN Type-3 IMET routes)
  • LEAF2 receives the flooded ARP, forwards it locally, and HOST2 receives it
  • HOST2 sends an ARP Reply (unicast) back to HOST1

The above process must take place, and initially doesn’t seem to be much more efficient than the “normal” operation, so there seems to be no benefit. However, the benefit comes next:

Phase 2: BGP Learning

  • When HOST2 sends that reply, LEAF2 learns HOST2’s MAC and IP address locally
  • LEAF2 immediately advertises this as a BGP EVPN Type-2 route (MAC/IP Advertisement) to all other VTEPs
  • LEAF1 receives this Type-2 route via BGP and installs an entry: “IP 192.168.1.X → MAC XX:XX:XX:XX:XX:XX → behind LEAF2 VTEP”
  • If ARP suppression is enabled, this also creates an ARP suppression entry on LEAF1

Phase 3: The Real Benefit (ARP Suppression)
Now imagine 5 minutes later HOST1’s ARP cache times out, or a different host (HOST3) connected to LEAF1 tries to reach HOST2. With ARP suppression:

  • HOST3 sends an ARP Request broadcast for HOST2
  • LEAF1 intercepts the broadcast
  • LEAF1 checks its EVPN-learned database and finds HOST2’s MAC/IP binding
  • LEAF1 suppresses the broadcast—it does NOT flood it across the fabric
  • LEAF1 replies directly to HOST3 with HOST2’s MAC address (acting as a proxy)
  • Zero flooding for this ARP request

So, without EVPN, every ARP Request for every destination floods to all switches, forever (until cache timeout, then floods again). With EVPN but without ARP suppression, there is control-plane MAC learning, with reduced unknown unicast, but ARPs still flood. Finally, with EVPN and ARP suppression, only the first ARP for a silent host floods, but all subsequent ARPs are answered locally by the ingress VTEP. Make sense?

I hope this has been helpful!

Laz

1 Like