Understanding RTs and RDs

      No Comments on Understanding RTs and RDs

One of the items that continues to come up in my conversations with folks learning about about MPLS VPNs is defining what a Route Target (RT) and Route Distinguisher (RD) are. More specifically, most seem to understand their purpose – but often times they don’t quite understand the application. I (and many others – just google “Understanding RDs and RTs”) have written about this in the past but Im hoping to put a finer point on the topic in this post.

If someone were to ask me to summarize what route targets and route distinguishers were – I’d probably define them like this…

Route Distinguishers – serve to make routes unique
Route Targets – metadata used to make route import decisions

Now – I’ll grant that those definitions are awfully terse, but I also feel like this is a topic that is often over complicated. So let’s spend some time talking about RTs and RDs separately and then bring it all together in a lab so you can see what’s really happening.

Route Distinguishers

As I said, a route distinguisher serve to make routes look unique. So why do we care about making routes look unique? I’d argue one of the most popular MPLS applications in use today is MPLS VPNs. If you’re unfamiliar with the topic, I suggest you start here but the base premise is that we can have a single network that supports multiple other networks (aka customers). In that scenario, your one common network (perhaps that of the service provider) has to be able to support multiple customers who are very likely using overlapping IP addresses. This of course poses a problem for a router in the service provider network. If customer A is sending 10.0.0.0/8 and customer B is sending 10.0.0.0/8 and my job as a router is to pick the best path to a given destination, how can I choose which one to use? What happens when I pick customer A’s route as my best path? The solution to this problem is to make each customers routes unique. The means for doing this with MPLS VPNs is with RDs or route distinguishers. So we take our normal 32 bit IPv4 prefix, and we prepend it with a 64 bit RD…

Note: It’s assumed we’re talking about IPv4 prefixes here and for the rest of this post.

By prefacing our 32 bit (4 byte) IPv4 prefix with a 64 bit RD (8 byte) we create a 96 bit (12 byte) VPNv4 prefix.

Note: I think it’s also worth talking about the prefix or subnet mask. Many folks (including myself) think of the entire prefix when we hear the term “prefix”. That is, the prefix itself along with it’s accompanying mask (e.g. 10.10.10.0/24). When we talk about turning a IPv4 prefix into a VPNv4 prefix – we only refer to the prefix itself which is why we’re only accounting for 32 bits. The mask is still there, and is an additional 32 bits, but since the mask it largely irrelevant in terms of making a prefix unique it’s often not referred to or talked about when discussing changing IPv4 to RD prefixed VPNv4 routes.

The RD can come in 3 different forms…

By and large, the most common type I see is a type 0 but it’s really a matter of whatever format you decide to use. The layout makes no difference so long as you’re consistent across all routers that will carry the VPNv4 prefixes. The reason for consistency is one we’ll see later on when we do the lab work.

Once we’ve decided on what type of RD prefix we want to use, we can configure it on our router so that the router can advertise VPNv4 prefixes to other routers. Assuming that we’re coming from AS 65000 our customer A and customer B VPNv4 prefixes might look like this….

Customer A -> 65000:1:10.0.0.0
Customer B -> 65000:2:10.0.0.0

By prepending the appropriate RD to a customer prefix we make it unique which means that multiple customers with overlapping IPv4 prefixes can live in the same routing table. In Juniper land – this routing table is referred to as bgp.l3vpn.0 and it where all the IPv4 L3VPN (or VPNv4) routes live.

Route Targets

Route target’s are a completely different animal than route distinguishers. To put it plainly, a route target (RT) is just a BGP extended community that is or can be appended to routes. For those of you on Juniper gear, who are already doing MPLS VPNs, this is likely already very apparent to you because of their implementation (more on this later). So if it’s just an extended community, then why is it called a Route Target? Well – that’s because there is a specific type of extended community just for route targets. If you read RFC7153 you’ll see that the RTs can be of type 0 (0x00), 1 (0x01), or 2 (0x02) which are transitive types.

Note: Transitive just means that it’s a community that can stay on the advertisement. In the case of RTs, we want them to propagate which means they have to be transitive.

If you’re paying close attention when you read that RFC – you’ll notice that the format for type 0, 1, and 2 BGP extended communities very closely follows the types for the RDs we discussed above. From the RFC…

This is one of the main reasons you will frequently see people use matching RTs and RDs. As we’ll see in the lab later though, there are tradeoffs depending on how you allocate RTs and RDS.

As a last comment – it’s also worthwhile to point out that RTs have a well defined sub type under each of these main types which is 2 (0x02).

Lab time

Alright – now that I’ve talked your ear off – let’s go and see some of this in action in the lab. The lab I’m using today is based on Juniper vMX (freely available to trial (you should go and try it out!)) and looks like this…

Note: The blue number indicate interface numbering. For instance, 0 indicates interface ge-0/0/0

As you can see, there are 7 routers in the lab, 3 of which have clients connected to them (labelled 1,2, and 3). I’ll refer to these clients as “endpoints” when we talk about testing reachability to and from them. The intent of the lab is for endpoints 1 and 3 to be part of the same customer network while endpoint 2 is part of it’s own customer network. The initial configuration of the lab is that all interfaces are configured and OSPF is configured for the routers physical interfaces and loopbacks all within the same area. If you look closely – you’ll notice that vMX2 and vMX3 have the same local prefix defined (10.10.20.0/24). Here are the relevant bits of each router’s base configuration…

vMX1

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.10.10.1/24;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.0/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 1.1.1.1/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 1.1.1.1;
}
protocols {
    ospf {
        area 0.0.0.0 {                  
            interface lo0.0;
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
        }
    }
}

vMX2

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.10.20.1/24;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.2/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 2.2.2.2/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 2.2.2.2;
}
protocols {
    ospf {
        area 0.0.0.0 {                  
            interface lo0.0;
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
        }
    }
}

vMX3

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.10.20.1/24;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.4/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 3.3.3.3/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 3.3.3.3;
}
protocols {
    ospf {
        area 0.0.0.0 {                  
            interface lo0.0;
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
        }
    }
}

vMX4

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.1/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.8/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.7/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 4.4.4.4/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 4.4.4.4;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
        }
    }
}

vMX5

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.9/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.10/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.3/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 5.5.5.5/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 5.5.5.5;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
        }
    }
}

vMX6

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.6/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.11/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.5/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 6.6.6.6/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 6.6.6.6;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
        }
    }
}

vMX7

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.12/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.16/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.14/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 7.7.7.7/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    resolution {
        rib bgp.l3vpn.0 {
            resolution-ribs inet.0;
        }
    }
    router-id 7.7.7.7;
    autonomous-system 65000;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
        }
    }
}

A couple of things to point out.

  • vMX7 is going to act as a route reflector but is not going to be part of our label domain or data plane. Because of this – it’s going to have issues finding next hops for the routes it receives which will mean the routes will be hidden and not sent to other RR (route reflector) peers. To fix this, we use the set routing-options resolution rib bgp.l3vpn.0 resolution-ribs inet.0 command to tell the router to use the information in inet.0 rather than inet.3 to resolve the routes and make them valid for readvertisement.
  • You may have noticed that I did not include the client facing interfaces in OSPF meaning that the clients won’t currently have reachability to each other. That’s because I want to put all of these clients into VRFs. We’ll do that below.

Alright – so let’s create a VRF on vMX[1-3] for each of our endpoints to live in. To do that, we’ll create a local routing instance and map the endpoint facing interface into it…

vMX1

set routing-instances customer_a instance-type vrf
set routing-instances customer_a interface ge-0/0/0.0
set routing-instances customer_a route-distinguisher 65000:1
set routing-instances customer_a vrf-target target:65000:1

vMX2

set routing-instances customer_b instance-type vrf
set routing-instances customer_b interface ge-0/0/0.0
set routing-instances customer_b route-distinguisher 65000:2
set routing-instances customer_b vrf-target target:65000:2

vMX3

set routing-instances customer_a instance-type vrf
set routing-instances customer_a interface ge-0/0/0.0
set routing-instances customer_a route-distinguisher 65000:1
set routing-instances customer_a vrf-target target:65000:1

So let’s take a moment to digest all of this. We’re defining a VRF on each of the three routers that has a type of vrf. In Juniper parlance, that implies you’re going to use this routing-instance with MPBGP. If you want a local VRF – you would define it with an instance type of virtual-router. Once defined, we place the endpoint facing interface in the VRF – nothing too exciting there. Next we define an RD and an RT. The interesting bit is that Juniper want’s you to call out that the RT is a target type extended community which means that you have to preface the RT with target:. Other than that, it’s all pretty basic. We’re using an RD of type 0 and aligning our RTs with our RDs. Once the configuration is committed, we should see that we now have a local table for each VRF….

[email protected]> show route table customer_a.inet.0 

customer_a.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[Direct/0] 00:06:43
                    > via ge-0/0/0.0
10.10.10.1/32      *[Local/0] 00:06:43
                      Local via ge-0/0/0.0

[email protected]> 

Awesome. But this isn’t that handy at this point because we’re not telling any other routers about our local endpoint networks. The next step is to configure BPG so that we can advertise these prefixes to other routers. Recall from above that I mentioned vMX7 was going to act as a route reflector for us. That being said, each of the edge routers (vMX[1-3]) will need to peer to the vMX7. However, that by itself won’t be enough to make this all work though. If we only peer certain routers in the network with BGP – then only certain routers in the network will know about the prefixes that are advertised through BGP. The problem occurs when endpoint 1 tries to talk to endpoint 3. vMX1 will have learned about the prefix for endpoint 3 from vMX3 – but the path to get there has to traverse vMX4 and vMX6, neither of which know about the routes being advertised into BGP. The fix for this is of course MPLS and a label distribution protocol such as LDP. Configuring vMX[4-6] with MPLS and LDP will allow them to forward traffic based on labels without having knowledge of the routes being advertised through BGP giving us what is often referred to as a “BGP free core”. If the basic concepts of MPLS aren’t concrete in your mind at this point I’d suggest you read my intro to MPLS posts here, here, and here.

Let’s now configure, BGP, MPLS, and LDP on all of the routers…

vMX1

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 1.1.1.1
set protocols bgp group internal neighbor 7.7.7.7
set protocols bgp group internal family inet-vpn unicast
set protocols mpls interface ge-0/0/1.0
set protocols ldp interface ge-0/0/1.0
set interfaces ge-0/0/1.0 family mpls

vMX2

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 2.2.2.2
set protocols bgp group internal neighbor 7.7.7.7
set protocols bgp group internal family inet-vpn unicast
set protocols mpls interface ge-0/0/1.0
set protocols ldp interface ge-0/0/1.0
set interfaces ge-0/0/1.0 family mpls

vMX3

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 3.3.3.3
set protocols bgp group internal neighbor 7.7.7.7
set protocols bgp group internal family inet-vpn unicast
set protocols mpls interface ge-0/0/1.0
set protocols ldp interface ge-0/0/1.0
set interfaces ge-0/0/1.0 family mpls

vMX[4-6]

set protocols mpls interface ge-0/0/0.0 
set protocols ldp interface ge-0/0/0.0 
set interfaces ge-0/0/0.0 family mpls
set protocols mpls interface ge-0/0/1.0
set protocols ldp interface ge-0/0/1.0
set interfaces ge-0/0/1.0 family mpls
set protocols mpls interface ge-0/0/2.0
set protocols ldp interface ge-0/0/2.0
set interfaces ge-0/0/2.0 family mpls

vMX7

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 7.7.7.7
set protocols bgp group internal neighbor 1.1.1.1
set protocols bgp group internal neighbor 2.2.2.2
set protocols bgp group internal neighbor 3.3.3.3
set protocols bgp group internal cluster 0.0.0.0
set protocols bgp group internal family inet-vpn unicast

Most of that configuration should look pretty familiar to you. However – do note that we enabled the inet-vpn unicast family as part of the BGP configuration. Doing so allows the propagation of VPNv4 routes which is what we’re aiming to do. Also note that we’re including a cluster ID on vMX7 which is what tells the router that it’s a route reflector.

Alright – now we should be cooking with gas here. After that configuration is in – let’s take a look and see what’s going on. Let’s start by looking at the customer_a routing table on vMX1…

[email protected]> show route table customer_a.inet.0 

customer_a.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[Direct/0] 00:03:29
                    > via ge-0/0/0.0
10.10.10.1/32      *[Local/0] 00:03:29
                      Local via ge-0/0/0.0

[email protected]> 

Well – this is not what I had hoped for. I was hoping that we would be seeing the 10.10.20.0/24 route from vMX3 which shared a common RD/RT. By default JunOS will import and export routes from VRFs if they share the same RT. Remember – the important decision maker for which routes get imported where is the RT – not the RD. So let’s take a look and see what routes we’re advertising to our peers from vMX1…

[email protected]> show route advertising-protocol bgp 7.7.7.7 extensive 

customer_a.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
* 10.10.10.0/24 (1 entry, 1 announced)
 BGP group internal type Internal
     Route Distinguisher: 65000:1
     BGP label allocation failure: Need a nexthop address on LAN
     Nexthop: Not advertised
     Flags: Nexthop Change
     Localpref: 100
     AS path: [65000] I 
     Communities: target:65000:1

[email protected]> 

So it looks like we’re at least trying to advertise our customer_a route but if we look at the highlighted line we see that we’re having an issue. Need a nexthop address on LAN appears to be the error. For those of you that read my previous MPLS VPN article you might recognize this. So let’s back up a second and review why this is happening.

In our lab – routers vMX[1-3] are acting like PE routers. That is, they are terminating a customer VRF locally with the configuration of the routing-instance and it’s associated physical interface. On top of that, we have a client connected to this interface on a broadcast network. In a more traditional setup, you might not actually have clients directly connected to a PE router. Rather, you’d have a customer router (CE (Customer Edge)) that was running some sort of routing protocol with the PE. In those scenarios, the PE can associate a label with a next hop (the address it learns from the CE). With a broadcast network we can’t do that since we have to ARP for the client’s address. There are a few ways to fix this (discussed here) but by and large the quickest and easiest way to fix this is with the command vrf-table-label. Juniper does a pretty good job of summarizing how this feature works…

Map the inner label of a packet to a specific VPN routing and forwarding (VRF) instance. This allows the examination of the encapsulated IP header. The first lookup is done on the VPN label to determine which VRF instance to refer to, and the second lookup is done on the IP header to determine how to forward packets to the correct end hosts.

When you include the vrf-table-label statement in the configuration of a VRF routing instance, a label-switched interface (LSI) logical interface label is created and mapped to the VRF routing table. Any routes in the VRF routing table are advertised with the LSI logical interface label allocated for the VRF routing table. When packets destined for the VRF routing instance arrive on a core-facing interface, they are treated as if the enclosed IP packet arrived on the LSI interface and are then forwarded and filtered based on the correct table.

All routes in a VRF routing instance configured with this option are advertised with one label allocated per VRF.

So let’s configure this on the routers vMX[1-3]…

vMX[1,3]

set routing-instances customer_a vrf-table-label

vmX2

set routing-instances customer_b vrf-table-label 

And if we take a look at the interface on one of these routers, we’ll now see we have these LSI interfaces that the Juniper documentation refers to…

[email protected]> show interfaces terse | grep lsi    
lsi                     up    up
lsi.0                   up    up   inet    

[email protected]>

Now let’s take a look at the routes we’re advertising to the route reflector (vMX7)…

[email protected]> show route advertising-protocol bgp 7.7.7.7 extensive    

customer_a.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
* 10.10.10.0/24 (1 entry, 1 announced)
 BGP group internal type Internal
     Route Distinguisher: 65000:1
     VPN Label: 16
     Nexthop: Self
     Flags: Nexthop Change
     Localpref: 100
     AS path: [65000] I 
     Communities: target:65000:1

[email protected]> 

Now let’s also look at our local bgp.l3vpn.0 table on vMX1…

[email protected]> show route table bgp.l3vpn.0 

bgp.l3vpn.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

65000:1:10.10.20.0/24                
                   *[BGP/170] 00:10:33, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0, Push 16, Push 299824(top)

[email protected]>  

Aha! Our first glimpse as a VPNv4 prefix! The customer route 10.10.20.0/24 was prepended with the RD of 65000:1 to form a VPNv4 prefix. Awesome! And now if we try and do some test pings between endpoints 1 and 2 we should see that they are working…

PING 10.10.20.1 (10.10.20.1) 56(84) bytes of data.
64 bytes from 10.10.20.1: icmp_seq=1 ttl=61 time=1.84 ms
64 bytes from 10.10.20.1: icmp_seq=2 ttl=61 time=2.91 ms
64 bytes from 10.10.20.1: icmp_seq=3 ttl=61 time=1.94 ms
64 bytes from 10.10.20.1: icmp_seq=4 ttl=61 time=1.97 ms

Awesome. So at this point – we’ve pretty much just redid the previous post I did on MPLS VPNs. Now – I want to focus on the impact of RTs and RDs. At this point – all the edge routers are sending their VPNv4 routes to the route reflector…

The route reflector will see all of these routes as unique VPNv4 routes. We can validate that by looking at it’s bgp.l3vpn.0 table…

[email protected]> show route table bgp.l3vpn.0    

bgp.l3vpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

65000:1:10.10.10.0/24                
                   *[BGP/170] 3d 04:59:29, localpref 100, from 1.1.1.1
                      AS path: I, validation-state: unverified
                    > to 169.254.10.15 via ge-0/0/2.0, Push 16
65000:1:10.10.20.0/24                
                   *[BGP/170] 3d 05:09:08, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 169.254.10.17 via ge-0/0/1.0, Push 16
65000:2:10.10.20.0/24                
                   *[BGP/170] 00:00:02, localpref 100, from 2.2.2.2
                      AS path: I, validation-state: unverified
                    > to 169.254.10.13 via ge-0/0/0.0, Push 22

[email protected]> 

These routes are then sent to all of the route reflector peers…

In this case, things seem to be working as expected. Granted, vMX1 and vMX3 don’t really need to get a copy of the customer_b routes since they don’t have an customer_b interfaces. The same goes for vMX2 which is receiving all of the customer_a routes. However – it is important that the routers receive the routes and make their own decision on what to with them. Now – lets intentionally make a misconfiguration on vXMX2…

set routing-instances customer_b route-distinguisher 65000:1

What we’re doing here is intentionally overlapping our RD for customer_b with that of customer_a which currently has an active working path from vMX1 to vMX3. To update our graphic, things now look like this…

Notice that the customer_b route now is sending a route with the same RD as customer_a. Now let’s take a look at our ping….

64 bytes from 10.10.10.1: icmp_seq=4386 ttl=60 time=2.19 ms
64 bytes from 10.10.10.1: icmp_seq=4387 ttl=60 time=2.35 ms
64 bytes from 10.10.10.1: icmp_seq=4388 ttl=60 time=2.34 ms
64 bytes from 10.10.10.1: icmp_seq=4389 ttl=60 time=2.15 ms
From 10.10.10.1 icmp_seq=4390 Destination Net Unreachable
From 10.10.10.1 icmp_seq=4391 Destination Net Unreachable
From 10.10.10.1 icmp_seq=4392 Destination Net Unreachable

Huh. So that’s not good. What’s going on? Remember – RDs serve to make routes look unique. Specifically – since a JunOS router will store all of VPNv4 routes in the bgp.l3vpn.0 table they need to be uniquely identifiable there. If they look the same – the router will do what routers do – pick the best one and advertise it on to other peers. Let’s take a look at the bgp.l3vpn.0 table on vMX7 again…

[email protected]> show route table bgp.l3vpn.0 

bgp.l3vpn.0: 2 destinations, 3 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

65000:1:10.10.10.0/24                
                   *[BGP/170] 3d 05:11:28, localpref 100, from 1.1.1.1
                      AS path: I, validation-state: unverified
                    > to 169.254.10.15 via ge-0/0/2.0, Push 16
65000:1:10.10.20.0/24                
                   *[BGP/170] 00:00:14, localpref 100, from 2.2.2.2
                      AS path: I, validation-state: unverified
                    > to 169.254.10.13 via ge-0/0/0.0, Push 23
                    [BGP/170] 3d 05:21:07, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 169.254.10.17 via ge-0/0/1.0, Push 16

[email protected]> 

Compare this with the output we saw before we made the misconfiguration and you’ll notice that we no longer have 3 distinct routes. Rather, we have 2 unique routes and one of them (65000:1:10.10.20.0/24) shows two possible next hops. Furthermore, if we look at what is being sent to vMX1 now…

[email protected]> show route advertising-protocol bgp 1.1.1.1 

bgp.l3vpn.0: 2 destinations, 3 routes (2 active, 0 holddown, 0 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
  65000:1:10.10.20.0/24                    
*                         2.2.2.2                      100        I

[email protected]> 

We see that the route reflector is preferring the route to vMX2 for the prefix which it then happily advertises to vMX1…

Now – you might be thinking “On no! We’re now crossing traffic between customer_a and customer_b!”. Indeed – it is logical to think that our ping reply intended to head back to endpoint 3 is now going to endpoint 2 because that’s where the return routing in pointing. However – that’s not true at all. Let’s look at vMX1…

[email protected]> show route table customer_a.inet.0  

customer_a.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[Direct/0] 01:26:10
                    > via ge-0/0/0.0
10.10.10.1/32      *[Local/0] 01:26:10
                      Local via ge-0/0/0.0

[email protected]> 

The customer_a routing instance on vMX1 doesn’t have any routes learned through BGP at all. So huh…. Let’s check the bgp.l3vpn.0 table and see what’s going on…

[email protected]> show route table ?            
Possible completions:
  <table>              Name of routing table
  customer_a.inet.0    
  customer_a.inet6.0   
  inet.0               
  inet.3               
  inet6.0              
  mpls.0               
[email protected]> show route table   

Huh – so the table doesn’t even exist ( a sign that the router has not learned any VPNv4 routes). OK…. Let’s see what the route reflector is sending us…

[email protected]> show route receive-protocol bgp 7.7.7.7 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)

inet.3: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)

customer_a.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)

mpls.0: 13 destinations, 13 routes (13 active, 0 holddown, 0 hidden)

inet6.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

customer_a.inet6.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

[email protected]> 

So. Huh. We just saw that the route reflector was sending us routes albeit the wrong one. So what gives? There are a couple of issues at play here that are worth talking through.

First – Juniper routers that are not RRs will not accept routes that do not match an existing vrf-import policy or a defined vrf-target. As Juniper puts it

VPN route processing differs from normal BGP route processing in one way. In BGP, routes are accepted if they are not explicitly rejected by import policy. However, because many more VPN routes are expected, the Junos OS does not accept (and hence store) VPN routes unless the route matches at least one VRF import policy. If no VRF import policy explicitly accepts the route, it is discarded and not even stored in the bgp.l3vpn.0 table. As a result, if a VPN change occurs on a PE router—such as adding a new VRF table or changing a VRF import policy—the PE router sends a BGP route refresh message to the other PE routers (or to the route reflector if this is part of the VPN topology) to retrieve all VPN routes so they can be reevaluated to determine whether they should be kept or discarded.

So long story short – the route reflector is sending the routes to vMX1 – but the router just won’t accept the routes if it doesn’t need them. What is not ideal is that the command show route receive-protocol bgp is a little deceiving in that it makes you think you should be looking at the routes fresh off the wire. The reality is that the routes need to be stored somewhere in order to be parsed by the command. Since the router is flat out rejecting them, they aren’t stored and henceforth won’t show up in any of our show commands. To get then to show up we can define a routing-instance that has a matching RT to make the router import the VPNv4 prefix with the matching RT. So let’s make up a story like “our service provider is pre-provisioning customer_b off of vMX1”. So let’s define the VRF for customer_b on vMX1…

set routing-instances customer_b instance-type vrf
set routing-instances customer_b route-distinguisher 65000:2
set routing-instances customer_b vrf-target target:65000:2
set routing-instances customer_b vrf-table-label

The above command will define the VRF and the vrf-target that matches the RT we’re looking for in order to get the route to import. So let’s commit that and check again…

[email protected]> show route receive-protocol bgp 7.7.7.7 table bgp.l3vpn.0 extensive 

bgp.l3vpn.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
* 65000:1:10.10.20.0/24 (1 entry, 0 announced)
     Import Accepted
     Route Distinguisher: 65000:1
     VPN Label: 21
     Nexthop: 2.2.2.2
     Localpref: 100
     AS path: I  (Originator)
     Cluster list:  0.0.0.0
     Originator ID: 2.2.2.2
     Communities: target:65000:2

[email protected]> show route table bgp.l3vpn.0                                           

bgp.l3vpn.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

65000:1:10.10.20.0/24                
                   *[BGP/170] 00:03:09, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0, Push 21, Push 299840(top)

[email protected]> 

Awesome – so we can see that the route is coming in from the RR now. We can also see the RD and RT that exist on the route. In this case, the RD for the route is 65000:1 and the RT is target:65000:2.

So now that we fixed the first issue that was preventing cross pollination of customer routes let’s talk about the second issue. If we look at the customer_a routing table, you’ll see we’re still missing the route…

[email protected]> show route table customer_a.inet.0 

customer_a.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[Direct/0] 01:38:07
                    > via ge-0/0/0.0
10.10.10.1/32      *[Local/0] 01:38:07
                      Local via ge-0/0/0.0

[email protected]> 

And that’s because the route we’re getting for 10.10.20.0/24 has an RT of 65000:2 which does not match the vrf-target of the customer_a VRF. It does however match the vrf-target of customer_b which now has a local table of it’s own since our pre-provision of customer_b…

[email protected]> show route table customer_b.inet.0  

customer_b.inet.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.20.0/24      *[BGP/170] 00:05:06, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0, Push 21, Push 299840(top)

[email protected]> 

I mentioned this earlier but the route import decision is based on the vrf-target and the vrf-import statements. Think of vrf-target as being is own basic sort of import policy that says “I will export routes from this VRF with this route target (vrf-target) and I will import routes from the bgp.l3vpn.0 table that match the same route target (vrf-target)”.

So let’s start talking about more advanced VRF import and export. To start with let’s clean up out customer_b “pre-provision” we did a minute ago on vMX1…

delete routing-instances customer_b

And fix the RD for customer_b on vMX2…

set routing-instances customer_b route-distinguisher 65000:2

Now let’s talk about vrf-import and vrf-export. I noted earlier that using the vrf-target <your target> syntax implied that you were using the same target for import and export. You can also use the syntax vrf-target import <import target> and vrf-target export <export target> if you want to specify unique targets for import and export. While this it totally doable, it’s much more likely that you’d define a policies for vrf-import and vrf-export. So let’s take a stab at that…

set policy-options community customer_a members target:65000:1
set policy-options community customer_b members target:65000:2
set policy-options policy-statement cust_a_vrf_import term all from community customer_a
set policy-options policy-statement cust_a_vrf_import term all from community customer_b
set policy-options policy-statement cust_a_vrf_import term all then accept
set policy-options policy-statement cust_a_vrf_import then reject
set routing-instances customer_a vrf-import cust_a_vrf_import

The above configuration does a few things. First, we define the communities we want to look for. Second, we use them in a policy that looks for either community and accepts them. The last statement in the policy rejects everything else. Lastly – we apply that as an import policy into our customer_a VRF.

The result is that we end up importing routes into the customer_a VRF that match either the target:65000:1 RT or the target:65000:2 RT. If we look at the customer_a routing table we can see that is the case…

[email protected]> show route table customer_a.inet.0                            

customer_a.inet.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[Direct/0] 3d 07:48:38
                    > via ge-0/0/0.0
10.10.10.1/32      *[Local/0] 3d 07:48:38
                      Local via ge-0/0/0.0
10.10.20.0/24      *[BGP/170] 00:04:53, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0, Push 24, Push 299840(top)
                    [BGP/170] 00:04:54, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0, Push 16, Push 299824(top)

[email protected]> 

This will of course once again break our ping because we can see that the router is preferring the first path. It’s hard to tell from this output why it is, but if we look at the more extensive routing table output we’ll see why…

[email protected]> show route table customer_a.inet.0 10.10.20.0/24 extensive

customer_a.inet.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
10.10.20.0/24 (2 entries, 1 announced)
TSI:
KRT in-kernel 10.10.20.0/24 -> {indirect(1048574)}
        *BGP    Preference: 170/-101
                Route Distinguisher: 65000:2
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd008350
                Next-hop reference count: 3
                Source: 7.7.7.7
                Next hop type: Router, Next hop index: 613
                Next hop: 169.254.10.1 via ge-0/0/1.0, selected
                Label operation: Push 24, Push 299840(top)
                Label TTL action: prop-ttl, prop-ttl(top)
                Load balance label: Label 24: None; Label 299840: None; 
                Label element ptr: 0xd008280
                Label parent element ptr: 0xd0081c0
                Label element references: 1
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x140
                Protocol next hop: 2.2.2.2
                Label operation: Push 24
                Label TTL action: prop-ttl
                Load balance label: Label 24: None; 
                Indirect next hop: 0xb68d0e0 1048574 INH Session ID: 0x150
                State: <Secondary Active Int Ext ProtectionCand>
                Local AS: 65000 Peer AS: 65000
                Age: 6:10 	Metric2: 1 
                Validation State: unverified 
                Task: BGP_65000.7.7.7.7+179
                Announcement bits (1): 0-KRT 
                AS path: I  (Originator)
                Cluster list:  0.0.0.0
                Originator ID: 2.2.2.2
                Communities: target:65000:2
                Import Accepted
                VPN Label: 24
                Localpref: 100
                Router ID: 7.7.7.7
                Primary Routing Table bgp.l3vpn.0
                Indirect next hops: 1
                        Protocol next hop: 2.2.2.2 Metric: 1
                        Label operation: Push 24
                        Label TTL action: prop-ttl
                        Load balance label: Label 24: None; 
                        Indirect next hop: 0xb68d0e0 1048574 INH Session ID: 0x150
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.1 via ge-0/0/1.0
                                Session Id: 0x140
			2.2.2.2/32 Originating RIB: inet.3
			  Metric: 1			  Node path count: 1
			  Forwarding nexthops: 1
				Nexthop: 169.254.10.1 via ge-0/0/1.0
				Session Id: 0
         BGP    Preference: 170/-101
                Route Distinguisher: 65000:1
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd0085f0
                Next-hop reference count: 2
                Source: 7.7.7.7
                Next hop type: Router, Next hop index: 614
                Next hop: 169.254.10.1 via ge-0/0/1.0, selected
                Label operation: Push 16, Push 299824(top)
                Label TTL action: prop-ttl, prop-ttl(top)
                Load balance label: Label 16: None; Label 299824: None; 
                Label element ptr: 0xd008400
                Label parent element ptr: 0xd007ec0
                Label element references: 1
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x140
                Protocol next hop: 3.3.3.3
                Label operation: Push 16
                Label TTL action: prop-ttl
                Load balance label: Label 16: None; 
                Indirect next hop: 0xb68d1f0 1048575 INH Session ID: 0x14f
                State: <Secondary NotBest Int Ext ProtectionCand>
                Inactive reason: Not Best in its group - Router ID
                Local AS: 65000 Peer AS: 65000
                Age: 6:11 	Metric2: 1 
                Validation State: unverified 
                Task: BGP_65000.7.7.7.7+179
                AS path: I  (Originator)
                Cluster list:  0.0.0.0  
                Originator ID: 3.3.3.3
                Communities: target:65000:1
                Import Accepted
                VPN Label: 16
                Localpref: 100
                Router ID: 7.7.7.7
                Primary Routing Table bgp.l3vpn.0
                Indirect next hops: 1
                        Protocol next hop: 3.3.3.3 Metric: 1
                        Label operation: Push 16
                        Label TTL action: prop-ttl
                        Load balance label: Label 16: None; 
                        Indirect next hop: 0xb68d1f0 1048575 INH Session ID: 0x14f
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.1 via ge-0/0/1.0
                                Session Id: 0x140
			3.3.3.3/32 Originating RIB: inet.3
			  Metric: 1			  Node path count: 1
			  Forwarding nexthops: 1
				Nexthop: 169.254.10.1 via ge-0/0/1.0
				Session Id: 0

[email protected]>

The highlighted line above shows that the second path is not preferred because of the router ID. Since the routes were originated from 2.2.2.2 and 3.3.3.3 the router is preferring the path with the lower router ID. This example shows you once again the disconnect between RTs and RDs. The unique RDs of these two routes is what allows them both to be viewed as unique routes on the route reflector (vMX7) and therefore also advertised as unique routes to vMX1. On vMX1 – the RDs allow us to view the routes as unique in the bgp.l3vpn.0 table but the import decision is based strictly on the RTs attached to those routes.

Before this post get’s too long, let’s briefly talk about vrf-export policies as well. To set the stage for that, let’s clean some things up on vMX1. Let’s delete this line…

delete policy-options policy-statement cust_a_vrf_import term all from community customer_b 

This will put things back to a more normal mode of operation preventing the customer_b prefixes from being inserted into the customer_a routing table on vMX1. Our ping should once again be working so let’s talk about vrf-export policies. Let’s start by doing something intentionally wrong once again…

set policy-options policy-statement cust_a_vrf_export term all then community add customer_b
set policy-options policy-statement cust_a_vrf_export term all then accept
set policy-options policy-statement cust_a_vrf_export then reject
set routing-instances customer_a vrf-export cust_a_vrf_export

If we add this configuration on vMX1 and commit it – our ping will stop working again. While the issue causing this lack of connectivity is perhaps more apparent in this example than others – it’s still worth talking through. I mentioned earlier that JunOS makes it obvious that the route targets are extended communities. This is the perfect example of why that’s the case. The export policy adds the targets using the community syntax. Our VRF export above says to add a community for customer_b which by itself should not cause any issues within customer_a. But our ping stopped and if we look at the routing table on vMX3, we’ll see that we no longer have a return route for 10.10.10.0/24

[email protected]> show route table customer_a.inet.0 

customer_a.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.20.0/24      *[Direct/0] 3d 22:22:23
                    > via ge-0/0/0.0
10.10.20.1/32      *[Local/0] 3d 22:22:23
                      Local via ge-0/0/0.0

[email protected]> 

But as some of you may have already guessed, the route does show up in the customer_b routing table on vMX2…

[email protected]> show route table customer_b.inet.0 

customer_b.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[BGP/170] 00:05:04, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.3 via ge-0/0/1.0, Push 16, Push 299808(top)
10.10.20.0/24      *[Direct/0] 00:11:26
                    > via ge-0/0/0.0
10.10.20.1/32      *[Local/0] 00:11:26
                      Local via ge-0/0/0.0

[email protected]> 

But if you recall, we still have a vrf-target defined in the routing instance…

[email protected]> show configuration routing-instances 
customer_a {
    instance-type vrf;
    interface ge-0/0/0.0;
    route-distinguisher 65000:1;
    vrf-import cust_a_vrf_import;
    vrf-export cust_a_vrf_export;
    vrf-target target:65000:1;
    vrf-table-label;
}

[email protected]> 

For folks new to JunOS that cane be initially confusing. Logically you might assume that the defined vrf-target would still get appended to the routes being advertised. That is – anything done in my export policy would be in addition to that community being there. The reality is that once you define a vrf-import or vrf-export policy they override the vrf-target. In fact, at this point we can delete the vrf-target

[email protected]# delete routing-instances customer_a vrf-target target:65000:1 

[edit]
[email protected]# commit 
commit complete

[edit]
[email protected]# show routing-instances 
customer_a {
    instance-type vrf;
    interface ge-0/0/0.0;
    route-distinguisher 65000:1;
    vrf-import cust_a_vrf_import;
    vrf-export cust_a_vrf_export;
    vrf-table-label;
}

[edit]
[email protected]# 

So if the vrf-export overrides out vrf-target, then we need to add all route targets that we want as part of the vrf-export policy. If we add…

set policy-options policy-statement cust_a_vrf_export term all then community add customer_a

Then our route will be exported with both RTs…

[email protected]> show route advertising-protocol bgp 7.7.7.7 extensive 

customer_a.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
* 10.10.10.0/24 (1 entry, 1 announced)
 BGP group internal type Internal
     Route Distinguisher: 65000:1
     VPN Label: 16
     Nexthop: Self
     Flags: Nexthop Change
     Localpref: 100
     AS path: [65000] I 
     Communities: target:65000:1 target:65000:2

[email protected]> 

And our ping will once again start working since vMX3 (with it’s vrf-target configuration for target:65000:1) will once again start importing the route from vMx1…

[email protected]> show route receive-protocol bgp 7.7.7.7 table bgp.l3vpn.0 extensive 

bgp.l3vpn.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
* 65000:1:10.10.10.0/24 (1 entry, 0 announced)
     Import Accepted
     Route Distinguisher: 65000:1
     VPN Label: 16
     Nexthop: 1.1.1.1
     Localpref: 100
     AS path: I  (Originator)
     Cluster list:  0.0.0.0
     Originator ID: 1.1.1.1
     Communities: target:65000:1 target:65000:2

[email protected]> show route table customer_a.inet.0                                     

customer_a.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[BGP/170] 00:02:06, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.5 via ge-0/0/1.0, Push 16, Push 299808(top)
10.10.20.0/24      *[Direct/0] 3d 23:05:46
                    > via ge-0/0/0.0
10.10.20.1/32      *[Local/0] 3d 23:05:46
                      Local via ge-0/0/0.0

[email protected]> 

This should again reenforce the idea that RTs are just extended communities as we can have multiple RTs on a single route.

Im hoping that the above examples nailed down what route targets and route distinguishers are used for. More importantly, I hope the examples showed that while they are related they serve two totally different purposes. In the next post, we’re going to dive a little deeper down the rabbit hole.

Leave a Reply

Your email address will not be published. Required fields are marked *