IP SLA and Static Route Load Balancing

Hi All,

I get the SLA configuration and have implemented failover successfully across two ISP links, using static defaults. The behaviour is as expected: primary route fails, secondary is inserted into the RT and all’s well with the world.

My question

What if I wanted to load balance across across the two routes, versus failover? in other words, traffic load balances, meanwhile it’s tracking. One route fails, traffic will continue onto the other, available route.

I’ve tried to do this by adding equal cost default routes and tracking the SLA object, route maps and source-interface statements in the tracking config, but no luck: one route fails, no routing. Tried two different SLAs, each bound to an interface, still no luck.

Note
Each ISP’s router sits on a VLAN, with a stick router attached on a trunk, and each router’s ISP appears as a subinterface on the router.

Here’s a partial config, showing the current working active/failover setup:

track timer interface 5
!
track 1 ip sla 1 reachability
!
interface GigabitEthernet0/1
 no ip address
 ip virtual-reassembly in
 duplex auto
 speed auto
!
interface GigabitEthernet0/1.80
 encapsulation dot1Q 80
 ip address 192.168.80.170 255.255.255.248
 ip nat outside
 ip virtual-reassembly in
!
interface GigabitEthernet0/1.100
 encapsulation dot1Q 100 native
 ip address 192.168.100.254 255.255.255.0
 ip access-group 101 in
 ip nat inside
 ip virtual-reassembly in
interface GigabitEthernet0/1.101
 encapsulation dot1Q 101
 ip address 192.168.101.254 255.255.255.0
 ip nat outside
 ip virtual-reassembly in
!
!
ip nat translation timeout 10
ip nat inside source route-map ISP2-NAT interface GigabitEthernet0/1.80 overload
ip nat inside source route-map ISP1-NAT interface GigabitEthernet0/1.101 overload
ip route 0.0.0.0 0.0.0.0 192.168.80.169 track 1
ip route 0.0.0.0 0.0.0.0 192.168.101.253 10
!
ip sla 1
 icmp-echo 8.8.8.8 source-ip 192.168.80.170
!
ip sla schedule 1 life forever start-time now
!
logging source-interface GigabitEthernet0/1.100
access-list 100 permit ip 192.168.200.0 0.0.0.255 any
access-list 100 permit icmp any any echo
access-list 100 permit icmp any any echo-reply
access-list 100 permit icmp any any unreachable
access-list 100 permit tcp any any eq www
access-list 100 permit tcp any any eq 443
access-list 100 permit tcp any any eq 4244
access-list 100 permit tcp any any eq 5222
access-list 100 permit tcp any any eq 5223
access-list 100 permit tcp any any eq 5228
access-list 100 permit tcp any any eq 50318
access-list 100 permit tcp any any eq 59234
access-list 100 permit tcp any any eq 5242
access-list 100 permit udp any any eq domain
access-list 100 permit tcp any any gt 1024
access-list 100 permit tcp any any eq ftp

route-map ISP2-NAT permit 10
 match ip address 100
 match interface GigabitEthernet0/1.101
!
route-map ISP1-NAT permit 10
 match ip address 100
 match interface GigabitEthernet0/1.80
!
!

Any examples? thoughts?

Thanks,

Ahmed.

Hello Ahmed

You say that the current configuration you posted is working, correct? I suggest you follow these troubleshooting steps:

  1. Attempt to simply change the AD of the second default route from 10 to 1 (return it to the default). This will make both routes equal and should result in equal cost load balancing.
  2. Next, verify that load balancing is indeed taking place. You can add some ACLs on the outbound interfaces with permit statements and have them log traffic. Check to see that traffic is indeed being sent from both interfaces. You may also want to check your NAT translations to see that translations are taking place on both interfaces as well.
  3. Once that is done, then you can simulate a failure. Don’t shut down the interface on this router, but shut down the interface on the next-hop router that corresponds to the 192.168.80.169 IP. Check connectivity, and check the access list logs for each interface.

I suspect that the issue has to do with the application of the same NAT translations on multiple load-balanced interfaces, but without doing the testing I cannot be sure. As you follow this procedure, you should see when and under what circumstances the failure takes place, and this should give you a good indication of where the problem is.

Let us know your results and get back to us for further troubleshooting steps if needed.

I hope this has been helpful!

Laz