BGP

You are currently browsing articles tagged BGP.

BGP path selection

One of my favorite things about BGP is how flexible it is.  The other thing I love is that it’s really easy to see all of the possible paths the router knows about.  However, when it all boils down, BGP has to select the best path (or paths if you select to change the max-paths from 1).  What I’d like to do in this post is briefly describe the best path selection process and then talk about some practical examples.  Let it be known, there are tons of sites out there that talk about the best path selection process so I don’t want to beat the dead horse.  I’ll cover the big hitters and provide examples as I go.  As I go through the examples, I’ll be reusing the lab from the last post…

image

Let’s get right into it…

1 – Weight
Weight is the first value compared in the best path decision.  Weight is a Cisco proprietary variable and is only significant to the router in which you configure it on.  Weight can be configured for specific prefixes by the use of a route-map inbound on a neighbor or by configuring the weight of a neighbor for all prefixes received from it.  Let’s take a look at doing it both ways on router5…

image

As you recall, router5 was preferring the older path through router4.  Let’s increase the weight of the neighbor for the peering to router5…

image

Above you can see that we set the weight of the neighbor 10.0.0.5 to 5.  Once we cleared BGP you can see that the weight for all prefixes learned from 10.0.0.5 is now showing as 5.  Also note that this is now the preferred path for router5.  Now, lets configure a route-map to change the weight of a specific prefix…

image

First we configure a prefix list to match just the 192.168.1.1/32 prefix.  We then use that on our route-map as the match component, and then set the weight to 10.  I then go into BGP and configure the route-map inbound on the 10.0.0.2 neighbor.  It takes a second to apply, but in the second ‘show ip bgp’ output you can see that the route-map has been applied and that the 192.168.1.1/32 network is now preferred through 10.0.0.2.  Note that since I didn’t add a second permit route-map entry that I no longer receive the 172.64.1.0/24 prefix from router4 since there is no match for it in the prefix list we created. 

2 – Local Preference
Local preference is a means to manipulate outbound route preference from an AS.  Unlike weight, the local preference attribute is shared with all members of an AS.  For instance, say that we want all of the routers in AS 100 to use the path through router5 to get to the prefixes attached to router1.  Let’s look at the BGP table on router3 to see the current best path…

image

As you can see, AS 100 is currently preferring the route up through router2 to the prefixes.  We’re using the next-hop-self command so the next hop is showing as the iBGP peer address of router2.  Now, let’s make some changes…

image

Here we specify one of the prefixes being advertised to the AS (172.64.1.0/24) and set the local preference to 200.  The default local preference value is 100 so the value we set should be the winner.  We configure a second permit route-map entry to pass all non-matching prefixes and then we apply the route-map inbound on the eBGP peer of router4.  The result can be seen on router2 and router3…

image

image

The first ‘show ip bgp’ in each block was taken before we made the changes and the second was taken directly afterwards.  Note how each router is now showing the LocPref of 200 for the 172.64.1.0/24 prefix as well as having selected it as the best path.  Much like weight, local preference can also be configured on a more global level.  For instance, I can set the default local preference for the entire router.  Let’s try that on router2…

image

The result of this can be seen on all routers in the AS but let’s just take a look at router4…

image

Note how router4 is now showing the LocPref of 300 for the routes from router2.  Also note that the preferred path is now through router2 for both of the prefixes. 

3 – Prefer locally originated routes
This one is pretty straight forward.  Basically you prefer routes that are locally originated over routes that are learned.  This is easy to understand as locally originated routes have a weight of 32768,  We can see this on router1…

image

Routes can be originated using the network, redistribute, and aggregate commands.  If there are multiple local BGP routes they are preferred in this order.
1 – Routes introduced with the network command
2 – Routes introduced with the redistribute command
3 – Routes introduced with the aggregate command

In the example below, I introduced the 172.64.1.0/24 prefix on router3 via the aggregate command and the network command…

image

Looking at the specific BGP table entry we can see that the redistributed route beat out the aggregate route…

image

If we were to use the network statement to introduce the prefix that would replace the aggregate and become the best path.

4 – Shortest AS-Path
This one is pretty self explanatory.  You prefer the path that has the shortest AS-Path count.  This attribute is generally used to influence inbound routing through the use of AS-Path prepending.  For instance, lets say that router1 would really prefer to have all of the traffic from all the other autonomous systems coming in from router2 under normal operation.  We could prepend the advertisement out to router7 to make that path seem not as appealing.  Let’s try that out and see what the results are…

image

Here we create a route-map that prepends ‘400 400 400’ to the AS-Path list.  We then set it outbound on the router7 BGP peer.  Now let’s take a look at a couple of the BGP tables on other routers…

image

Router7 still sees the path directly to router1, but now prefers the route down to router6.  Note that the path through router1 has a substantially longer AS-Path at this point…

image

Router5 now prefers thee route through router4.  Notice that router5 no longer has a route in the BGP table through router6.  This is because router6’s best path is through router5.  Router6 will advertise only the best path to the prefix and router5 will drop the advertisement since it sees its own AS in the AS-Path list.

5 – Lowest Origin type
Here we look to see where the prefix came from.  As we know, redistributed routes have an incomplete origin and routes originated via the network or aggregate commands have an origin of IGP or ‘i’.  There’s also ‘e’ or EGP but that isn’t used anymore.  That being said, BGP prefers routes that have an origin of IGP over routes that have an origin of incomplete. 

6 – Lowest MED
The MED (Multi-Exit Discriminator) is used to influence inbound routing from a mulithomed AS.  The lower the MED the better.  Since the MED attribute is non-transitive, it is only of use when you are dual homed to the same AS.  For instance, if you have redundant paths to the same AS, you could use MED to tell the neighboring AS which path to use to reach your prefixes.  That being said, we can’t really use MED in our existing topology without tweaking it slightly…

image

Let’s assume that we want traffic from AS 100 to enter AS 400 through the new link we just created.  We can do this by advertising a increased metric from router1 to router2. 

image

Since the default metric is 0, we need to advertise a higher metric over the path we’d prefer not be used.  In this case, we advertise a metric of 100 to router2.  Router2 shares that metric within the AS and all BGP router in AS 100 decide to use the path through router4 on the new link…

image

The output above shows the BGP table on router2 before and after the change.  As you can see the metric is now 100 after we did a soft clear on the BGP session.  Router3 and router4 look similar…

image

image

Note that neither router3 or router4 actually have a path with the metric set to 100.  This is because router2 is no longer advertising the route to 3 and 4 since it isn’t it’s best path. 

7 – eBGP over iBGP
This is another rule that just makes sense.  An eBGP learned route would be a quicker means to get out of an AS than following an iBGP route somewhere else in the AS to get to the prefix.  If you have a eBGP route for a prefix you should use it over any iBGP learned routes for the same prefix. 

8 – Prefer the lowest IGP metric to BGP next hop
This one is also pretty straight forward.  If you’ve gotten this far compare the IGP metric for the BGP next hop.  The lowest one wins.  Suppose we advertise the prefix 10.20.30.0/24 from router6.  Let’s also remove the next-hop-self commands from router2 and router4 in AS100.  In this way, router3 will see the BGP next hop for the prefix as being 10.0.0.1 and 10.0.0.25…

image

If we look at the route metrics for the BGP next hops, we can see that they are currently the same…

image

Let’s increase the delay on router2’s interface facing router1…

image

Doing this will increase the EIGRP cost to get to 10.0.0.25.  If we look at router3 again, we’ll see that the cost of the route to 10.0.0.24 has indeed increased and the BGP table now prefers the route through the BGP next hop of 10.0.0.9…

image

9 – Multipaths
This step really just tells the router to check and see if multipath is turned on.  If it is, it can select multiple paths and load balance across them.  This doesn’t change the fact that the router will only advertise the best path to it’s BGP peers though. 

10 – Prefer the oldest path
Just like it reads.  Pick the oldest path and use that one.  The one that’s been around longer is likely more stable.  We can see this happen in the below example…

image

In the first output the path through the BGP next hop of 10.0.0.29 is the best path.  If we clear the BGP session on router4, the route is withdrawn until the BGP session is rebuilt.  In the last output we can now see that the path through 10.0.0.25 is the best path since it’s been there longer than the freshly installed path through 10.0.0.29.

11 – Prefer the lowest Router-ID
The age old tie breaker.  If you get this far, your path selection probably isn’t of great concern to you.

Tags:

An important part of understanding BGP is understanding how it sends it’s routing updates.  Take for example this basic topology…

image

There are 7 routers total and 4 autonomous systems.  We’ll be using EIGRP as the IGP for link reachability.  In addition, all of the BGP peering will be done with the physical interface IPs, I am not using loopbacks for the iBGP peering since there aren’t any redundant paths.  Let’s assume that EIGRP is up and all local physical interfaces are up.  Let’s also assume that BGP is up and configured with neighbors on each router but there is no additional configuration. 

Let’s start by advertising two prefixes into BGP on router1…

image

First I create a loopback interface and then I create a null0 route.  I advertise the loopback by using the BGP network command the static route by redistributing static routes into BGP.  Let’s take a look at the BGP table on router2 to start with…

image

So this is interesting.  Router2 has exactly one path to reach each prefix both of which are through it’s directly connected link to router1.  Also note that the 172.64.1.0/24 prefix shows an AS path of ‘400 ?’ while the 192.168.1.1/32 prefix shows an AS path of ‘400 i’.  Since the static route (172.64.1.0/24) was redistributed into BGP, the origin of the prefix is incomplete.  However, since the loopback was redistributed using the network statement it shows an origin of ‘i’ meaning that it came from manual redistribution. 

Note: When I see the ‘i’ I like to think that it means that “I manually injected that route into BGP” where as ‘?’ means that redistribution did it for me.

So let’s update our diagram to reflect the available paths as we look at each router.  I’ll use a blue arrow to show available paths and a green arrow to show the path that each router selected as it’s best path…

image

Since I’m trying to figure out why router2 only has one path to each prefix, let’s look at the BGP table on router3 to see what it has…

image

Well, now this is interesting.  Router3 has two paths to each prefix.  Note that the next hop for each of these prefixes is a non-directly connected interface for router3.  This is because we haven’t told the AS edge router in AS 100 to use the next-hop self command.  Let’s change that and then look at this again…

image

image

Looking at the BGP table on router3 we can now see that the next hop references the directly connected interfaces of router3…

image

Note that router3’s best path is through router2 for each of the prefixes.  Let’s update the diagram and move on…

image

Let’s now look at the BGP table on router4…

image

Interesting.  Once again, router4 only has one path to each prefix.  More interesting is that the best path for each of these prefixes is through router5.  Let’s update the diagram and move on to router5…

image

Router5’s BGP table shows that it once again has a single path through router6 to get to each prefix…

image

This is particularly interesting to me since from an AS perspective the prefix is exactly two AS hops away in either direction.  Once again, we’ll document this and move on…

image

Looking at the router6 BGP table we see that it has a single path to each prefix through router7.  Note that I already updated these iBGP peering with the next-hop-self command. 

image

image

And finally, we see that router7 also has single path to the prefixes through its directly connected peering to router1…

image

image

The final route diagram we have is rather interesting.  While some of it makes sense, other parts don’t.  For instance, why is router4’s best path through router5?  Why does router5 only have one path to the prefixes when it clearly has two equal cost AS paths to the prefixes.

Have you figured it out yet?  Im hoping some of you figured it out when you saw the first diagram.  I broke the cardinal rule of BGP.  All BGP routers in the same AS need to be peered to each other.  This is because all BGP routers will only share their best route with other neighbors.  In addition, iBGP advertisements don’t get re-advertised to other iBGP peers. 

That’s clearly not the case here.  Let’s break down the advertisements…

1. Router1 sends its updates to router7 and router2. 
2. Router7 and router2 mark the path towards router1 as their best path since its the only one they have
3. Router7 sends the update to router6.  Router6 marks the path through router7 as its best path since its the only one.  Router2 sends the update to router3 which marks the path as its best path since it currently only has the one. 
4. Router6 advertise the paths to router5 who marks them as the best since its the only one it has currently heard.  Router3 does NOT advertise any of these prefixes to router4 since they were learned over an iBGP path.  We can see this by looking at the routes advertised from router3 to router4…

image

5. Router5 advertise the path it learned through router6 to router4.  Since router4 doesn’t receive any advertisements from router3, it becomes router4’s best path.

So let’s fix the iBGP issue and see if things make more sense afterwards…

image

image

Now let’s update the diagram to see where we are now…

image

Ah ha!  This looks better.  Let’s walk through the advertisement process again now that we’ve fixed our problem…

1. Router1 sends its updates to router7 and router2.
2. Router7 and router2 mark the path towards router1 as their best path since its the only one they have
3. Router7 sends the update to router6. Router6 marks the path through router7 as its best path since its the only one. Router2 sends the update to router3 which marks the path as its best path since it currently only has the one.  Router2 also sends the update to router4 which also marks it as it’s best path since it’s currently the only one it has. 
4. Router6 advertise the paths to router5 who marks them as the best since its the only one it has currently heard. Router3 and router4 do not advertise the route to each other since they heard the advertisement via iBGP.  We can can see that here…

 image

image

5. Router5 advertises the path it learned through router6 to router4. Router4 in turn advertises the AS 100 best path through router2 to router5.  Both routers already have paths to the prefixes so they now need to decide which one is best.  Looking at the BGP table we can see how they picked the best path…

image

For router 4, the choice was easy.  The AS-path through iBGP up to AS 400 is much shorter through router2.  Note that the path is listed with a next hop of 10.0.0.25 since I didn’t use the neighbor next-hop-self command on the new iBGP peering.  Since this path is the best one the AS has, router4 has to advertise it to router5.  Let’s look at router5’s BGP table…

image

On router5, the AS path for each path is the same length.  Without diving into BGP path selection (that’s the next post!) I can tell you that the route through router6 (10.0.0.5) is being selected since it’s the oldest route. 

So there we have it.  The key takeaway for this post are these rules…

-iBGP learned routes are never advertised to other iBGP peers
-Only the best path to any given prefix is advertised to another BGP peer. 

Tags:

I hope you are starting to get the point, but there are a ton of different maps that can be configured with BGP.  The next one I want to cover is the advertise-map.  These maps tell the router which prefixes should part of the aggregate.  This becomes particularly handy when dealing with the as-set variable on a BGP aggregate.  However, lets test this out without as-set first so you get an idea of how it works.   image

Consider our lab above.  Let’s take a look at the ISP B router’s BGP configuration…

image

Notice that Im specifying a advertise-map of ispb_summary.  Let’s look at that route-map and associated prefix list…

image

So as you can see, the route-map is matching on the first two /24 routes in ISP B.  Let’s take a look at the BGP table as well…

image

Everything looks like I expect.  Now, let’s remove the two static routes I have that match the ispb_summary prefix list …

image

Immediately after removing the routes we look at the BGP table and see that we are no longer advertising the /20 aggregate.  This is because we told the router that the aggregate was comprised of the first two /24 prefixes.  As with any aggregate, if there are no matching (generally smaller) prefixes within the aggregate in the routing table, the aggregate is not advertised. 

Pretty straight forward right?  An interesting use case involves the as-set variable on the aggregate command.  Let’s restore the lab and take another look at how this works in conjunction with the as-set command.

Let’s take another look at ISP B’s BGP table…

image

Note that we don’t show a prefix for the /18 aggregate being generated at the ISP A level.  This is because the aggregate on ISP A is being generated with the as-set  command.  Let’s look at the BGP table on one of the ISP A routers to verify…

image

As you can see, both /18 routes include AS 100 and AS 300 in their AS set.  Since the /18 advertisement has each respective ISPs AS number in it, the advertisement will be blocked by the ISP B and C routers.  Suppose we want to advertise the /18 down to ISP B and ISP C.  This could be achieved by removing the ISP B and C AS from the /18 advertisement.  However, we still need one of the advertisements for the aggregate to exist.  We could, in this case, rely on the advertisement from one of the lower tier ISPs to provide the aggregate for the second ISP.  That would look like this…

image

Here, the solid green line shows that ISP A will build its aggregate for /18 based on receiving the single /20 from ISP B.  In turn, it can then advertise the /18 down to ISP C since the aggregate will not have AS 300 in it’s AS path attribute.  The same can be done in the reverse direction.  Let’s try this out on one of the ISP A routers…

image

Above, we configure a route-map with a prefix list matching the 175.15.0.0/20 aggregate coming from ISP B.  We then update our /18 aggregate with a advertise-map that tells the router which prefixes to build the local aggregate (the /18) with.  Once the change has been made, let’s examine the BGP table of the ISP A router to see if it worked…

image

Sure enough.  Note that we still have two /18 prefixes.  Note that the locally originated /18 lists a AS path only including AS 100.  Looking at the BGP table on ISP C, we can see that this worked as expected…

image

There are some flaws with this design.  Namely, if the ISP A iBGP peering fails ISP C will lose the /18 aggregate since it’s being built on the prefix coming across that iBGP peering.  However, it is another tool to keep in your BGP belt.

Tags:

« Older entries