I’m working with a 5-office WAN and all sites are using BGP to exchange routing information. At the Main site and the DR site, I have firewalls that don’t run BGP so I’m using OSPF on those. The firewalls are configured so that when the Internet is detected as up, the default route will be advertised. At both the Main site and DR, I have L3 switches peered with the firewalls and the switches receive the OSPF default routes properly. I have the DR default route configured with a metric of 254 so it will be less preferred.
The goal is to have all sites, even DR, use the Internet connection at the Main site when it’s up and then have all sites use the DR Internet if the main connection goes down. If I start off with both firewalls advertising default routes via OSPF to the L3 switches, everything is good. The L3 switch at Main receives the OSPF default route from the Main firewall, it gets redistributed into BGP, and the L3 switch at DR (and all of the other sites) see that default route coming from Main and their Internet traffic flows out that way. When I take down the Main Internet connection, the OSPF default route kicks in like it should and it gets redistributed via BGP. All of the sites see the new default route coming from the DR site and all of the Internet traffic flows out the DR Internet connection.
Now here’s where I have my problem. When I restore the Main Internet connection, Main and all of the sites except DR see the default route coming from Main and their Internet traffic flows out the Main Internet connection again. However, on the L3 switch at DR, I see that device holding on to to the OSPF default route coming from the DR firewall and so the DR site continues to send its Internet traffic out the DR Internet connection. If I go into the DR firewall and tell it to stop advertising the OSPF default route, and then toggle it back on to advertise to reset everything, then the DR site see the Main default route and things are back to normal.
I’m still learning my way through the finer points of BGP but here’s what I THINK is happening.
When I’m starting from scratch, the DR L3 switch will have two default routes - one learned from BGP with an AD of 20 and one learned from OSPF with an AD of 110. So the BGP default route gets installed into the routing table and everything is good. That eBGP learned route has a weight of 0. When I take down the Main Internet, that eBGP default route goes away and the OSPF default route gets installed in the table and then redistributed into BGP. BGP sees a redistributed route as locally generated and gives is a weight of 32768. When I restore the Main Internet connection, the Main default route from OSPF will get redistributed into BGP and the DR L3 switch will pick it up again but it will still have a weight of 0 and so it won’t replace the locally generated route with the weight of 32768.
I’m hoping someone here can tell me if I’m on the right track with my thinking and if so, how might I go about tweaking things so that the Main OSPF default route that is redistributed into BGP is always preferred over the DR OSPF default route that gets redistributed into BGP.
Thank you for taking the time to read this!