I ran into this today at work and I thought it was worth blogging about. So let’s take this example topology…
Let’s say that router1 is an edge router providing some service out in a DMZ. Let’s also assume that router2 is actually a firewall (in my lab example we’ll use a router). Router3 is a core edge or distribution layer device that’s fronting a data center. The data center has a local prefix of 10.65.0.0 /20 which has been allocated to it for it’s use.
For reasons that don’t really matter, we want to setup a eBGP peering between router1 and router3 to learn more specific prefixes from devices that will hang off of router1. In turn, we want to learn the more specific prefix for the local data center from router3.
Nothing too crazy at this point. Router1 is shipped to the DC with a base config, racked, cabled, and powered. The base config includes a few routes that will allow you to SSH into the device once it’s powered up. Namely, there’s a route that looks like this…
ip route 10.0.0.0 255.0.0.0 172.172.172.2 name RFC1918
So all is good, I start the config, then I go to turn up eBGP. Since router2 is actually a firewall (and a layer 3 hop), I request a rule for BGP and also enable eBGP multihop on the peers. My config looks something like this on each router…
router1#show run | sec router bgp
router bgp 65302
no synchronization
bgp log-neighbor-changes
neighbor 10.65.0.1 remote-as 65367
neighbor 10.65.0.1 ebgp-multihop 2
no auto-summary
router1#
router3#show run | sec router bgp
router bgp 65367
no synchronization
bgp log-neighbor-changes
redistribute static
neighbor 172.172.172.1 remote-as 65302
neighbor 172.172.172.1 ebgp-multihop 2
no auto-summary
router3#
Nothing crazy here right? Router 3 also has a summary route telling it how to get to the 172.172.172.0/30 network so the peering comes up as we expect…
But then something weird starts happening…
The route in the BGP RIB starts going from accessible to inaccessible. So what do I do? Blame the firewall of course. It didn’t help that the security guys told me they had random issues passing BGP. So I sort of left things where they were while the security team worked to resolve the firewall issues. Then, after awhile, I started looking at things a little closer. The red flag went up when I saw this occurring…
Have you figured out what the problem is? It took me a minute, but then I realized that I was learning a better prefix to reach my BGP peer through BGP. This is what was going on…
1. I used the 10.0.0.0/8 route to get to 10.65.0.1 initially and setup up the BGP session
2. Once the BGP session came up, I learned the prefix 10.65.0.0 /20 from router3. Since it was a more specific prefix, BGP attempts to load the prefix into the FIB.
3. The entry attempts to load into the FIB but CEF realizes that it can’t resolve a valid next hop.
4. CEF marks the entry as invalid and pulls the BGP route out of the FIB and the router falls back to the 10.0.0.0/8 route to get to it’s BGP peer.
5. The next BGP import, the same thing happens again
It’s sort of hard to wrap your head around if you don’t think about this the right way. When the router3 advertises the route to router1 it’s essentially saying…
“Here’s a route for 10.65.0.0/20, to get there you need to come to me (10.65.0.1)”
As we know, a route in BGP can have a non-directly connected next hop. When this happens, CEF has to resolve the next-hop to a directly connected one so it knows where to forward the frames. So the router now has a problem. It’s hearing…
“To get to 10.65.0.0 /20 come to 10.65.0.1”
from router3. The 10.65.0.0 /20 network consumes the peer IP address of router3. So the router can no longer use the 10.0.0.0 /8 route to get to 10.65.0.1 peer. At this point we’re stuck. You have a route, but you can’t get to it since the next hop isn’t directly connected. The means to get to a valid next hop (10.0.0.0 /8 pointing to 172.172.172.2) can’t be used since it’s less specific.
The fix for this is put a static /32 route for the BGP peer on router1 pointing to the directly connected interface (172.172.172.2).
This can of course happen in other manners. You could be running a IGP and have eBGP put a more specific route in place that would cause the same problem because of the lower admin distance.
Any interesting problem and one that I’m glad I ran into!
thanks for the explanation…