VXLAN Underlay eBGP Multi-AS

Hello Jose

Although this is indeed one of the recommended approaches, it does have its limitations. Using VRF-Lite with subinterfaces (ROAS) on spine switches recreates the multi-access complexity you avoided at the access layer! This means that it may be best practice for certain scenarios, but you have identified the limitations and difficulties that can arise.

In typical spine-leaf VXLAN designs, spines should be tenant-agnostic (pure IP underlay with EVPN route reflector function). Border leaves (a dedicated pair, often vPC’d) are the recommended place to terminate tenant VRFs and provide external connectivity. By placing tenant VRFs on spines, you’re violating this separation of concerns, which can hurt scale and operational clarity.

However, in smaller deployments or specific design constraints like yours, spine-based external connectivity with proper design (vPC, active/standby routing) can work.

Tunnels are probably not a good solution to this. GRE and/or IPsec tunnels add MTU overhead, which is a fragmentation risk and add operational complexity.

This is indeed a hard limit that is due to 802.1Q constraints. In production multi-tenant networks with thousands of VRFs, you must distribute tenant VRFs across multiple border device pairs. This is called VRF sharding. For example:

  • Border Set A (Spines 1 & 2 or Border Leafs 1 & 2): Handles Customers 1–2000
  • Border Set B (Spines 3 & 4 or Border Leafs 3 & 4): Handles Customers 2001–4000
  • Border Set C: Handles Customers 4001–6000

Your WAN edge/SD-WAN router must connect to multiple border sets (with separate physical ports or port-channels).

Ideally, we come back to the best practice of creating one or more dedicated border leaf pairs for external connectivity, but only if this is feasible. That way, you keep spines clean and tenant-agnostic, and it allows you to scale out border capacity independently.

As you can see there are several options for solutions that can be tried depending on the requirements and the limitations of your specific scenario. However, in my opinion, if you need ROAS and tenant VRFs on spines, the fabric has already outgrown that design.

Let me make clear that I don’t have hands-on experience with a deployment similar to the one you are describing, but the above does align with how large-scale VXLAN EVPN fabrics are typically designed. As such, my responses should be taken as guidelines for furthering your troubleshooting and experimentation. I look forward to hearing your results!

I hope this has been helpful!

Laz