MPLS 101 – MPLS VPNs

In our last post, we removed our last piece of static configuration and replaced static routes with BGP.  We’re going to pick up right where we left off and discuss another use case for MPLS – MPLS VPNs.  Really – we’re talking about two different things here.  The first is BGP VPNv4 address families used for route advertisement.  The second is using MPLS as a data plane to reach the prefixes being announced by VPNv4 address family.  If that doesn’t make sense yet – don’t worry – it will be pretty clear by the end of the post.  So as usual – let’s jump right into this and talk about our lab setup.

As I mentioned in the last post, setting up BGP was a prerequisite to this post – so since that’s the case – Im going to pick up right where I left off.  So I’ll post the lab topology picture here for the sake of posting a lab topology – but if you want to get your configuration prepped – take a look at the last post.  At the end of the last post we had our two clients talking to one another using MPLS as the data plane forwarding mechanism and BGP as the route or prefix advertisement mechanism.

To do this – we peered router 1 to router 4 with BGP and let them advertise the directly connected prefixes that met a prefix-list based filter.  This worked great and removed the need for us to use static routes to force a recursive routing lookup into the inet.3 table so that we could use MPLS labels for forwarding.  But now what happens if we want to support more than one tenant?  Let’s say that Im running an ISP and I want to provide connectivity between different endpoints in the network for different tenants.  And what if those tenants use the same IP networks?  The only way to solve that problem is to provide a means of layer 3 isolation on my network so each tenant can stay isolated.  So in our example above, let’s say that client 1 and client 2 are members of company ABC and we’re tasked with providing them an isolated path across my amazing 4 router backbone network.  To do this, there are 2 main problems we have to solve.

  • Distributed L3 Isolation – How do we extend that isolation across the backbone so that company ABC can reach all of their locations?
  • Local L3 Isolation – How do we provide local isolation on the router where the client enters the backbone?

So let’s tackle these problems one at a time.

Distributed L3 isolation

Up until now we’ve been using the default or global table on each router to provide routing functionality.  If we look at the routing-table on the router 1 we’ll see all of the OSPF based loopback addresses as well as the remote BGP prefix we learned from router 4…

Bottom line – we’re dealing with a common routing table. To provide isolation, we’ll need to fix this. So the first thing we need to do is figure out how we do this with BGP. At the moment, BGP is talking between router 1 and router 4 in the default or global table. Prefixes it advertises to other peers live here and prefixes it learns land here. So let’s back up a second. We know that we want to support multiple customers – and those customers could use the same prefixes – so how can we possibly have BGP advertising the same prefixes for different customers? The answer comes in the form of VPNv4 prefixes. You see – to make a routing advertisement unique, VPNv4 prefixes just add additional distinguishing information onto a route advertisement. This distinguishing information is aptly called a “route distinguisher” or more commonly just an “RD”. And RD is a 64 bit value that prefaces a customers routing advertisement. For instance…

You can see above that to make a standard prefix a VPNv4 prefix we simply preface it with an RD. That RD can come in three different forms…

The form you use doesn’t matter much but you should use the same form for consistency sake. So now the question becomes, how do we get BGP to send and understand VPNv4 prefixes? To do that, we need to activate a new address family – VPNv4. This is easy to do, we simply enable the inet-vpn family under the ibgp group we created earlier…

Let’s add the same change to router 4 as wellset protocols bgp group ibgp family inet-vpn unicast

So now what?  Surprisingly, that’s sort of it.  Feel like you missed something?  Don’t worry!  It will all come together soon, but we need to tackle the next task of providing local L3 isolation before we can see any real progress.  So let’s get that out of the way and then see where we stand.

Local L3 Isolation

Providing local layer 3 isolation on a router is pretty simple thing to do in most cases.  Both Juniper and Cisco allow you to define VRF (Virtual Routing and Forwarding) instances which create isolated routing tables.  You can then add interfaces from the router/switch into a given VRF.  By default, all interfaces live in the default (sometimes called global) routing table.  Moving them to a VRF removes them from the default routing table and places them in the VRF’s routing table.  Let’s take a look at what it would take to add a Company ABC VRF on router 1 and 4 so I can show you what I mean.

Even in this mode, where we are just defining local VRFs (sometimes called VRF-lite mode), there are some differences in how you define them between Juniper and Cisco.  In Cisco – a VRF is a VRF regardless if you’re using it locally or intend to use it in conjunction with something like MPLS and VPNv4.  On a Juniper, there are two routing-instance types for VRFs.  One for VRF-lite mode and one for using VRFs in conjunction with VPNv4…

The two we’re going to concern ourselves with are types virtual-router and vrf. virtual-router defines the VRF-lite mode where the VRF is only local to the box. Since we know we want distributed isolation, we’re going to need to use the vrf instance type. Let me show you why. Here’s the definition of a simple routing-instance…

Notice that the definition is incredibly simple. We define the name, the type, and the interface we want in it. This configuration is valid and passes a commit check. Let’s commit it and take a quick look around…

Once committed – we can see that we now have a new routing table on the router named company_abc.inet.0. That routing table has the prefixes associated with ge-0/0/0 in it. So if we look at the global or default routing table, we’ll notice these are now gone…

So this means that any traffic that comes into this IP interface on this router will be only allowed to talk other prefixes in this table. At this point – there aren’t any.  So how do we get some?  BGP and it’s VPNv4 route advertisements to the rescue!

In the last section, we talked about the need for VPNv4 routes, and we learned how to enable them in BGP, but we didnt really define any.  That’s because RD’s are defined as part of a routing-instance.  Here’s where we’ll see the difference between the virtual-router and the vrf instance type with Juniper.  If we try to define an RD in our existing routing-instance, we’ll see this…

So as it turns out we need to use the vrf instance-type in order to define an RD.  So let’s change our configuration to look like this…

Notice we changed the type to vrf and added a route-distinguisher of 65000:150.  Now if we try and commit this we’ll see…

So now it also wants a route target. What’s that all about?  I mean – we already made the routes unique by tacking on an RD.  So what purpose does a route-target serve?  Route targets (known as RTs) are what is actually used to make policy decisions about a prefix.  Whereas RDs serve only to make a quote unique – RTs are the gate keepers to a VRF and what the router uses to decide what route advertisements can/should go where.  What often confuses people is that the definition of an RT and RD can be identical if you wish. For the sake of this lab we’ll make them different but do note that they follow the same type definition/format as RDs as we showed in the earlier section. So let’s add an RT to our definition…

Once again – let’s now make the same similar changes on router 4 (I’ll provide these in set syntax so you can see the actual commands I used)…

Note that in my lab the customer facing interface for client 1 and client 2 differs so you can’t copy and paste this configuration to both routers without changing the interface. 

So now that this is all in place – we can do some validation.  Let’s start on router 1…

Notice that under the peer4.4.4.4 we now see the additional bgp.l3vpn.0 family.  The fact that its listed under the peer tells us that router 1 has negotiated this functionality with router 4.  In other words – they’re talking VPNv4 at this point.  We can also see what prefixes are being advertised by router 1…

Above we can see that for the BGP peering to router 4 (4.4.4.4) router 1 lists the 10.2.2.0/31 prefix as part of the company_abc.inet.0 table.  However – notice that the next-hop of this prefix is listed as Not Advertised.  We can confirm that this is not being advertised by running a similar command on router 4…

We can see here that router 4 is not receiving any prefixes from router 1 (1.1.1.1). So what’s going on? Shouldn’t this all work? Let’s take a closer look at the advertised routes on router 1 by tacking on the extensive keyword to the advertised routes command…

Huh – So its saying that it failed to allocate a next-hop address on LAN. What does that mean?  Let’s take a step back and talk about what’s going on right now and see if we can sort this out.

We mentioned earlier that there were two separate goals here.  One was to enable Distributed L3 isolation across our MPLS cloud and the second was to do it locally on each router that clients connect to.  We talked about how to enable BGP to advertise unique prefixes per customer (or VPN) but we didn’t talk about how that ties into MPLS to make a working data plane.  So how does this all work together?  The answer comes in the form of a label stack.  Let’s talk through what we think should be happening and then see if we can validate this in the lab.

So the first thing we need to know is that MPLS labels can be stacked.  In all of our previous posts, there was a single MPLS label and it was swapped at each MPLS hop.  This is how MPLS forwards traffic.  Adding more than one label opens up some interesting possibilities in terms of applications.  In the case of VPNv4 route advertisements, it means I can tell remote peers about an additional label.  So let’s walk through how this might work…

The picture above shows two label advertisements.  The first is the LDP label advertisement shown with the blue arrows.  As we saw in the post on LDP this label distribution occurs hop by hop and each label is locally significant.  In this case, router 4 is telling router 3 – “If you want to get to an LSP that ends at me (4.4.4.4) send me a frame with a top label of 3”.  Router 3 records that request in it’s local label database – and then tells it’s other LDP peers about this.  Router 3 tells router 2 – “If you want to get to an LSP that ends at router 4 send me a frame with the top label of 299808”.  And finally – router 2 tells router 1 – “If you want to get to an LSP that ends at router 4 send me a frame with the top label of 299936”.  This is how LDP works and it’s well described in the earlier post.  What’s new is the green line shown above.  This is a label allocated and learned by BGP and advertised to all VPNv4 capable BGP peers.  So how does this new label work?

  • When the top client wants to talk to the bottom client it will send traffic to router 1.  That interface is part of a local VRF (company_abc VRF) on the router and that VRF has an associated RD and RT.
  • Router 1 will do a route lookup in the company_abc routing table and see if it has a matching destination prefix for the traffic.  In this case, it would have learned the prefix 10.2.2.2/31 through BGP (specifically through the BGP VPNv4 advertisements).  If we examine that advertisement sent by router 4 for 10.2.2.2/31 we should see that the advertisement includes a VPN label. 
  • This VPN label will be pushed onto the frames label stack first.  Since the LSP for this traffic still ends at router 4 – the LDP based label will then also be pushed on the label stack. 
  • When router 2 receives the MPLS frame, it will only process the top label leaving the bottom label (the VPN label in this case) intact.  Router 2 will process the frame, swap the top label, and send it on it’s way. 
  • Each subsequent MPLS hop will do the same thing only dealing with the top label

The flow looks like this…

Pretty slick huh?  Notice that since router 4 had sent router 3 the label of 3 (Implicit Null) to reach 4.4.4.4 router 3 has obliged and popped the top label off of the stack.  In a scenario without MPLS VPNs, this would mean that router 4 would receive a native Ethernet frame and IP packet.  However – even after popping the top label we still have the VPN label to contend with.  This puts us in a little bit of a unique scenario and helps describe why things aren’t currently working.  In a typical MPLS configuration, the clients would more likely be CE or Customer Edge routers.  Router 1 and router 4 would be serving as PEs or Provider Edge routers.  In that scenario it is almost guaranteed that router 1 and router 4 would be routing to reach remote prefixes.  That is the link between the PE and the CE would be a transit network and the PE would be learning prefixes over some sort of dynamic routing protocol from the CE.  In that scenario, JunOS can advertise a label for each potential next hop address it needs to talk to.  That means when it receives a frame with a VPNv4 label it can directly translate this to a next hop and forward the packet.  In our case – the destinations are directly attached to a multi-access network.  This means that the next-hop is not well known and that the router will need to perform normal ARP discovery to find the MAC address for the destination.  So rather than a label to next hop direct mapping, we’re asking the router to perform two different actions when it receives a VPNv4 labelled frame.  First it has to use the VPN label to determine what VRF the traffic should land in – then it needs to ARP to find the destination.  That is why this is currently not working – the routers recognize that this is a problem and therefore wont allocate a VPNv4 label for the customer prefix.  Hence the message Need a nexthop address on LAN.  There are a couple of ways to solve this and this Juniper article does a good job explaining them.  In our case, we’re going to use the vrf-table-label command to fix this.  In addition to changing the label allocation method for VPNv4 prefixes – this also allows traffic to be recirculated on the router so that the VPN label can be popped and the ARP process can happen.  So let’s add the following configuration to router 1 and router 4…

Now if we try to ping client 1 from client 2 – we should see it’s working as expected…

Let’s look at what’s happening in more detail…

Notice that router 1 now shows it is advertising the prefix to router 4. Also note that it’s allocated a VPN label of 16 to use for this VRF.  The other impact that vrf-table-label has is that it allocates a single label for all prefixes for a given VRF (which makes way more sense given the command you entered).  So that means that all prefixes on router 1 that are a part of company_abc VRF will be advertised with a label of 16 to router 4.  Let’s look and see what we learned from router 4…

Here we can see that to reach the network 10.2.2.2/31 router 1 believes it should impose a bottom label of 16 and a top label of 299936.  Just like we expected it would!  Let’s verify that 16 is in fact the VPN label that router 4 sent to router 1….

Note: This can be the confusing things about small MPLS labs.  Often times, the same label is allocated on different boxes.  In our case – both router 1 and router 4 allocated a per VRF label of 16 for company_abc VRFs.  

Lots of interesting information here.  In the top highlighted section we can see that the company_abc routing table did receive an entry for 10.2.2.2/31 with a VPN label of 16, an RD of 65000:150, and next hop of router 4 (4.4.4.4).  Notice that the RT is listed as a community on that advertisement.  Recall that the RT is how we define routing policy in terms of what we import and export into a VRF.  At this point – since we haven’t applied any import or export policy for the VRF – JunOS will happily receive and advertise all advertisements in a given VRF.  We’ll look at this more in a later post.

The second highlighted section show the entry in the bgp.l3vpn.0 table.  This table stores all VPNv4 advertisements received by the router.  In a later post where we look at having multiple customers on the same MPLS backbone we’ll see that all of their advertisements land in this table before specific entries make it into the customer specific VRF tables.  So I think we can safely update our digram to look like this…

Last but not least, lets do some packet captures to validate this is actually whats hitting the wire…

ICMP echo request on the wire between routers 1 and 2…

ICMP echo request on the wire between routers 2 and 3…

ICMP echo request on the wire between routers 3 and 4…

The capture confirms the flow we described in the diagram perfectly!  In the above captures – the first label listed (top down) is the top label.  You can also tell that the VPN label of 16 is the bottom label since that label has the bottom of the label stack bit set to 1.  Notice that in the capture between router 3 and 4 we only have one label.  Recall that router 4 is asking router 3 to perform PHP to remove the LDP label before forwarding it the frame.

This has been a warp speed tour of MPLS VPNs and we relied on lots of the default functionality to make this work.  In the next post, I hope to cover more specifics about VRF import and export based on the RTs.  Stay tuned!

In our last post we talked about how to make the MPLS control plane more dynamic by getting rid of static LSPs and adding in LDP to help advertise and distribute LSPs to all MPLS speaking routers.  However – even once got LDP up and running, we still had to tell the routers to use a given LSP.  In the last post, we accomplished this by adding recursive static routes in the inet.0 table to force the routers to recurse to the inet.3 table where the MPLS LSPs lived.  In this post, we’re going to tackle getting rid of the static routes and focus on replacing it with a dynamic routing protocol – BGP.

So to start off with, let’s get our lab back to a place where we can start.  To do that, we’re going to load the following configuration on each router show in the following lab topology…

You’ll notice that these base configurations include configuration for not only the interfaces, but also OSPF, LDP, and MPLS.  Basically we’ve configured everything we had in the last post with the exception of the static routes.  At this point, client 1 and client 2 will not be able to communicate to one another.  So let’s now look at how we configure BGP.

You might recall that I mentioned in a previous post that BGP was interesting in the regard that it could list a next hop for a destination prefix that was not directly connected to the router.  This is because BGP can leverage recursion which is exactly what we had been doing with the static routes in the previous two posts.  So let’s go ahead and setup a BGP peering to try and replace what we leveraged with static routes previously.  The first thing we need to do is define what our BGP peers will be and in what AS (autonomous system) they each reside in.  In our case, to keep things simple initially, we’re going to put all of our peers in the same AS (65000) for now.  As for the peers, we’re going to peer router 1 to router 4.  You might be wondering why we aren’t peering routers 2 and 3 with anything.  Recall that those are MPLS P routers and don’t require any knowledge of the actual prefixes they are passing traffic for.

If you’re familiar with BGP, then there shouldn’t be anything terribly new for you here.  Juniper configuration breaks the configuration into groups.  In my case, I have a single group called ibgp in which I define the type of peering that Im working with internal as well as my neighbor, the local address I want to talk to that neighbor on, and the peer autonomous system.  Juniper also breaks out the AS and router-id into the routing-options stanza which can take some getting used to if you’re used to working on other platforms.

Now that its configured, we should be able to check and see that we have a BGP peering between router 1 and router 4…

Looks like the peering is up as we can see in the Up/Dwn column that we list an up time and the state is not Active.  However – we dont appear to be getting any routes.  We can see that the number of received routes is still 0.  So why is that?  Well it’s because we haven’t told BGP about any of the routes.  To do that, we need to define a routing policy that we can then have BGP reference as an export policy.  Here are two basic policies that we’ll be using in this example…

If you’re not familiar with Juniper policies (terms, from, then, etc) I’d suggest you spend some time reading through the Juniper Day One: Configuring Junos Policy and Firewall Filters eBook.  It’s a great read and has lots of good examples.  While the logic may seem basic at first, I assure you that there are lots of places you can shoot yourself in the foot if you don’t understand the defaults and the means in which routers process policy.  So let’s walk through the above policy logic as applied on router 1 briefly…

  • We start with a policy-statement called mpls_bgp
    • This has a term defined in it called the routes_for_mplswhich defines two match criteria…
      • Routes that are from the protocol direct which are routes the router believes to be directly connected
      • Routes that match a route filter looking for the exact prefix 10.2.2.0/31
      • If the two above requirements are met, we proceed to the thenaction which is to accept or allow the advertisement.
    • If the above term is not matched, the policy has a default action to reject

Pretty straight forward right?  Then all we do is apply the routing-policy to the BGP process by tagging it as the export policy underneath the BGP protocol stanza.  If you’re following along and have applied the policy to both routers you should see that client 1 can now ping client 2 once again.  We can also do some validation on the routers to validate they have the appropriate prefixes…

First we look to ensure that we see the destination prefix that client 2 is a member of (10.2.2.2 /31) on router 1. We see that we do in fact have a routing entry for the 10.2.2.2/31 prefix and that it lists it coming from 4.4.4.4 (router 4).  We also see that recursion has occurred and router 1 has gleaned the MPLS information from the inet.3 table in order to know what MPLS label to push onto the traffic.  With that information, as well as the next-hop interface and IP to reach 4.4.4.4 from OSPF (10.1.1.1 and ge-0/0/1.0) we have all the information we need to send the traffic on its way towards client 2.  We can also validate that we’re advertising the 10.2.2.0/31 prefix to router 4 by using the advertising-protocol command syntax on router 1…

As before – the advantage here is that routers 2 and 3 have no idea about the prefixes which routers 1 and 4 are advertising to each other…

So at this point, the only thing we’d need to do is add any additional prefixes to the routing-policy in order to have BGP dynamically advertise them to the other BGP peer.  This doesn’t buy us too much in over what we had with static routes, but it does setup us up to talk about the topic for our next post – VPNv4 with MPLS.  Stay tuned!

In our last post, we saw a glimpse of what MPLS was capable of.  We demonstrated how routers could forward traffic to IP end points without looking at the IP header.  Rather, the routers performed label operations by adding (pushing), swapping, or removing (popping) the labels on and off the packet.  This worked well and meant that the core routers didn’t need to have IP reachability information for all destinations.  However – setting this up was time consuming.  We had to configure static paths and operations on each MPLS enabled router.  Even in our small example, that was time consuming and tedious.  So in this post we’ll look at leveraging the Label Distribution Protocol (LDP) to do some of the work for us.  For the sake of clarity, we’re going to once again start with a blank slate.  So back to our base lab that looked like this…

Note: I refer to the devices as routers 1-4 but you’ll notice in the CLI output that their names are vMX1-4.

Each device had the following base configuration…

Again – nothing exciting here. The only difference between this base config and the one in the previous post was that I’ve given all of the routers loopback addresses rather than just routers 1 and 4. But while we’re at it – we know from the previous post that to use MPLS we need to turn MPLS on each interface which will pass labelled traffic.  So as we did in our first post, let’s once again enable MPLS on all of the interfaces that we expect to pass labelled traffic…

So now we’re back to pretty much where we were in the last post before we started programming our static LSP.  As you’ll recall – that was tedious and we had to manually generate and track what labels we were using.  To overcome this – a group of protocols exist called “Label distribution Protocols” which take care of much of this tedious work for us.  The two primary label distribution protocols are LDP (Label Distribution Protocol) and RSVP (Resource Reservation Protocol).  In this post, we’ll be covering LDP since its the easiest to get up and running with and we’ll tackle RSVP in an upcoming post.

Enabling LDP is just as easy as enabling MPLS was.  We simply turn it on for any interface which we expect to handle labelled packets…

Now we can verify that LDP is on with various show commands…

Notice that router 1 believes it has a single LDP interface which is accurate. Also notice that LDP has identified the IP address of this interface. We’ll also note that the neighbor count (Nbr count) is listed as 1 so if we run the same command on router 2 we should see that it’s interfaces are now also enabled for LDP…

Indeed it is – so the neighbor count being 1 implies that there is an LDP relationship between router 1 and router 2 on their directly connected interfaces. If we performed the same command on router 3 and router 4 we would see similar output. You might be wondering what the Label space ID field is.  This is often referred to simply as the LDP ID and consists of two components separated by a colon.  The first piece is the neighbors transport address.  In the above output, you can see that each router is using it’s own local loopback address (2.2.2.2 in the case of router 2) for each of it’s LDP enabled interfaces. If we look at the LDP neighbors for router 2 we’ll see the LDP ID for it’s adjacent neighbors…

As you can see – router 2 has two neighbors, 10.1.1.0 (ge-0/0/1 on router 1) and 10.1.1.3 (ge-0/0/0 on router 3).  Notice that each neighbors LDP ID starts with the loopback address of the neighboring router.  1.1.1.1 for router router 1 and 3.3.3.3 for router router 3.  With MPLS (especially on Juniper) it’s common for a nodes transport address to be the loopback address given that the transport address is expected to be unique.  The second part of the LDP ID in all cases appears to be 0.  So what does that mean?

The number after the colon indicates the means in which the node is allocating MPLS labels.  0 means that the node is using what is called a per-router (sometimes also called per-platform) label allocation method which is the default on Juniper devices.  This means that regardless of interface – all labels advertised for FECs need to be unique.  The other option is per-interface label space and means that in addition to labels you also track the interface they are associated with.  That is – with per-interface label space you could advertise the same label, out of two different interfaces, for two different FECs.  With per-router label space you would need to advertise two unique labels one for each FEC.  For now we’ll be dealing only with per-router label space.

Note: We talked a little bit about what FECs were in our last post.  In the majority of cases, a FEC is represented by a prefix.  That being said, it’s safe in most cases to use the two terms interchangeably.  However – FECs get labels assigned to them and in JunOS only a router’s loopback address, by default, is considered a FEC.  So when I refer to FECs throughout this post, Im referring to the routers /32 loopback address which is (naturally) a prefix.  I’ll try to only use the term prefix when referring to a prefix that is routed across an MPLS network and FEC when talking about a prefix that MPLS is aware of (has a label)

So we’ve verified our interfaces are enabled for LDP and we’ve even seen that we have some neighbors at this point.  The catch here is that there is a distinction between finding neighbors and building sessions.  Just because we have a neighbor doesn’t mean we have a valid LDP session.  LDP has merely discovered that there are other possible peers reachable through it’s interfaces by using LDP hellos.  The real magic happens once an LDP session is established.  If we look at router 2 we’ll see that we don’t currently have any active LDP sessions…

So why is this? The routers are obviously talking otherwise we wouldn’t see any neighbors. One of the requirements for LDP is that a given router can reach the transport address of any LDP peers. In this case, router 2 doesn’t know about any of the other routers loopbacks…

The simple fix here is to setup an IGP like OSPF and advertise the loopbacks into it. Let’s do that on all 4 routers now…

Notice that on the edge routers (routers 1 and 4) I dont put the user facing interfaces in OSPF. If I did, we wouldn’t need MPLS 🙂

We can quickly validate on one of our hosts that OSPF adjacencies have formed…

And if we look at the LDP sessions we should now see they are operational…

If you check all of your routers, you should see a similar story.  So what do we get out of an LDP session?  Let’s tack on the term extensive to our show ldp session command and see…

As you can see there’s quite a lot of info here but lets focus on one item for now – the received next-hop addresses for each peer.  These are learned as part of the LDP session establishment. If we looked at the output of show ldp neighbor we’d see that a couple of these addresses (for directly connected interfaces) appeared in that output as well.  However, this output gives us the complete list of possible IP addresses for each peer.  We’ll get to why these are important shortly.

So now that our sessions are established the next thing LDP will start doing for us is sending LDP label mappings.  LDP will automatically assign a label to each FEC that a router knows about and advertise that label and FEC (typically a prefix) to all of it’s peers.  LDP does this the easiest way possible – by flooding the advertisements out of every LDP enabled interface.   You might recall from the last post that Juniper devices by default advertise their own /32 loopback address as a FEC.  So in our case, each router already knows about a FEC to advertise since each router has a local loopback address.  In order to advertise the FEC with a LDP label mapping message the router needs to assign a label to the loopback.  To see what the router assigned, we can take a look at the LDP database…

The top table Input label database shows the labels the vMX1 has received.  These are the labels that router 1 will push onto a frame’s label stack when sending traffic to these destination FECs through a given node.  The “given node” is an important distinction to make here.  Since labels are unique per router we need to track which labels we learn through which LDP peering session.  Notice in the output about that the router is showing the input and output label database for the peering session between 1.1.1.1:0 and 2.2.2.2:0.  As we’ll see later on – since LDP floods you’ll receive label mappings on any LDP enabled interface but that doesn’t necessarily mean that the router will use them in forwarding.  Let’s look at the output of the LDP database on the other 3 nodes as well…

So let’s see what this looks like visually just in terms of the set of labels that router 1 is advertising…

This is where things can get a little confusing if you aren’t used to working with MPLS.  Let’s just talk about how to reach the 1.1.1.1/32 prefix to start with.  In this case router 1 is advertising a label of 3 to router 2 for the FEC 1.1.1.1/32.  3 is a special label which signals the receiving router to perform PHP (Penultimate Hop Pop) so that the sending router receives a native packet and can directly process that rather than dealing with an MPLS header.

Note: Label 3 falls into the reserved label range and is actually called the “implicitnull” label.  If a upstream router receives this label it automatically knows to pop the top label off of the label stack before forwarding the frame downstream (I talk about upstream/downstream below).

Router 2 receives the label mapping advertisement for the 1.1.1.1/32 FEC from router 1 and then assigns a label to the FEC locally.  In this case, it assigns a label of 299904 to that FEC and then advertises that label to all of its LDP peers.

Note: While not pictured, you might have caught from the output that router 2 advertises the label not only to router 3, but also back to router 1.  This is because LDP is flooding and based on the output we can see that no split-horizon rules are in play for LDP.  Also note that it advertises the same label for the FEC since we’re using a per-router(platform) label space.    

Router 3 receives the LDP label mapping from router 2 for FEC1.1.1.1/32 and assigns a label of 299856 to the FEC locally.  It then advertises the label mapping to all of its LDP peers.   This method of advertising labels is referred to as “downstream unsolicited” since each LSR (Label Switching Router (a fancy term for an MPLS enabled router)) is sending all of the label mappings to each of its peers without being asked to.  This is the default means of label distribution with LDP.  So the unsolicited piece makes sense because the LSRs are sending LDP label mapping advertisements whether you like it or not.  But what does downstream mean?

I’ll admit that the upstream and downstream terminology was initially confusing for me when I started using MPLS because it’s all relative.  The key to understanding the terms “upstream” and “downstream” in the MPLS world is that they refer to a specific FEC and how an LSR perceives that FEC.  So in our example above – router 2 would be considered to be an upstream router (or LSR, I use the terms interchangeably here) to router 1 for the FEC 1.1.1.1/32.  Likewise, router 2 would be a downstream LSR to router 3 for the same FEC.  The bottom line is that the closer you are to the originator of the FEC – the further downstream you are.  Label advertisement is downstream to upstream.  So in this case – since the downstream router is advertising label mapping upstream without being asked for it – it’s called downstream unsolicited label distribution.  So if label advertisements flow from downstream to upstream (the blue dotted line) it also makes sense that actual traffic flows from upstream to downstream (the green dotted line).

Now let’s take a minute to address the LDP flooding and lack of split-horizon rules we saw above.  For instance, let’s look at the LDP database on router 2 more closely…

This is the oddity we saw above when we looked at the LDP database on router 1.  Why is router 2 receiving a LDP mapping for 3.3.3.3/32 from both it’s peering to router 1 and router 3? Clearly, router 2 only has a single path to 3.3.3.3/32 and that’s toward router 3. So why is router 1 advertising a path to 3.3.3.3/32 as well? The reason is that LDP floods completely indiscriminately. In other words, its job is just to tell every peer about every FEC it knows about. So in this case, this is what’s happening with the 3.3.3.3/32 FEC…

So let’s walk  through this….

  1. Router 3 has the FEC of 3.3.3.3/32 locally which it tells all of it’s neighbors about by advertising the prefix to all of it’s established LDP peers with a label of 3.(only showing the conversation to router 2 in the diagram).
  2. Router 2 receives the label mapping, assigns a local label of 299872 to the FEC and then advertises that label mapping to all of it’s LDP peers.  All of it’s LDP peers happens to include router 1 AND router 3 which was the originator of the FEC to start with.  At this point, LDP just told router 3 that it can reach the FEC of 3.3.3.3/32 by passing router 2 a label of 299872.
  3. Router 1 has also received the label mapping for 3.3.3.3/32 and happily told router 2 that it can reach the FEC by passing it a label of 299872 as well.

So this is a mess.  Moreover – this sort of advertising is looking like it’s going to cause nothing but loops.  So how does a router know which way is the correct downstream path to reach a FEC?  The answer is that each LSR relies on the underlying IGP (in this case OSPF) to determine the right label to use.   So let’s look at the inet.0 routing table on router 2 for the destination 3.3.3.3/32 to make this determination…

The RFC for LDP says that…

An LSR receiving a Label Mapping message from a downstream LSR for a Prefix SHOULD NOT use the label for forwarding unless its routing table contains an entry that exactly matches the FEC Element.

In other words, we need an entry in the inet.0 table for the FEC in order to even consider using the label we received to try and reach it.  As we can see above, we do have a route for 3.3.3.3/32 in inet.0 table so we’ve passed that requirement.  The problem now is how do we determine which label to use?  The problem is that the information required to make this assessment lives in a couple of different places.  Recall that when we looked at the output of show ldp session extensive on router 2 that we learned all of the MPLS enabled addresses of each LDP peer.  In this case, router 2 learned that router 1 has interface 1.1.1.0 and router 3 has interfaces 10.1.1.3 and 10.1.1.4.  If we consult the inet.0 entry for 3.3.3.3/32 we can see that it’s reachable through a physical next hop of 10.1.1.3 which the router knows is associated with the LDP peering toward router 3.  That means that router 3 is the true downstream LSR for this prefix and that router 2 should use the label it learned from router 3 to reach that FEC.  The logic looks like this…

So you now might be wondering why the inet.0 table on all of the routers doesn’t show any label operations since we just validated that the routers at least have some usable labels.  As in the first post, while the routers know what labels to use to reach a specific FEC, they still dont have a means to get traffic into an LSP to reach a particular FEC.   As we talked about in the first post, forwarding entries for MPLS are in the inet.3 routing table on Juniper equipment.  Recall that this is the routing table where Juniper stores it’s FECs and is sometimes called the FEC mapping table.  More specifically, it’s a recursive lookup table leveraged by BGP.  So let’s see what router 1 has in it’s inet.3 table…

Ahah!  The inet.3 table does list some label operations to reach all of the FECs we currently know about.  If we compare these to the LDP input database on router 1, we’ll see that these entries line up perfectly with the entries in the LDP peering session between router 1 and router 2 that we examined earlier (output above).  Namely…

  • To reach 2.2.2.2 on router 2 we send the traffic out of the ge-0/0/1.0 interface towards 10.1.1.1.  Notice that we do not push a label for this action.  This is because router 1 was told through its LDP peering session that to reach 2.2.2.2 it should use a label of 3.  3 indicates that the penultimate (the second to last) router should pop the label and deliver the traffic normally.  Since this is the second to last router it simply doesnt impose a label.
  • To reach 3.3.3.3 on router 3 we send the traffic out of the ge-0/0/1.0 interface towards 10.1.1.1.  In this case, we do impose a label of 299872.  This is the label we learned from the upstream router (router 2) for the FEC/prefix 3.3.3.3/32.  You can confirm that by consulting the LDP database on the peering session between router 1 and router 2.  You’ll see that router 2 advertised a label of 299872 to reach the prefix of 3.3.3.3/32.
  • To reach 4.4.4.4 on router 3 we send the traffic out of the ge-0/0/1.0 interface towards 10.1.1.1.  In this case, we do impose a label of 299888.  This is the label we learned from the upstream router (router 2) for the FEC/prefix 3.3.3.3/32.  You can confirm that by consulting the LDP database on the peering session between router 1 and router 2.  You’ll see that router 2 advertised a label of 299888 to reach the prefix of 3.3.3.3/32.

So this all seems to be lining up but let’s do the exercise one more time on router 2 so that we’re sure it makes sense.  So let’s look at the inet.3 table on router 2…

Notice that in this case, only one of the FECs impose a label (4.4.4.4./32).  That’s because the other LDP peers (router 1 and 3) are directly connected so they advertise router 2 a label of 3 which you’ll recall is the implicit null label and asks the upstream router to pop the label before forwarding the frame downstream.  The only other label we learned was from router 3 and was for how to reach the 4.4.4.4/32 FEC.  In this case router 3 is saying “Sending the frame out of your ge-0/0/1.0 interface and push a label of 299808 onto the frame”.  If we consult the output of show ldp database above from router 3 we’ll see that 299808 is indeed the label router 3 advertised to router 2 for that FEC/prefix.

One thing I hope you’ve noticed is that the labels here are locally significant.  If you replicate this lab on your own, there’s a chance that you’ll see each router allocate the same label for the same prefix.  That would work just as well.  Since each router keeps track of it’s own set of labels for each specific peering session there’s no reason they can’t overlap.

Looking back at the inet.0 table we’ll see that the router wants to use OSPF to reach all of these prefixes and doesn’t list a label to be pushed…

So despite having the entries in inet.3 they arent being used since we dont have any forwarding entries in inet.0.  The solution to this is to put recursive routes in the inet.0 table which will force the router to reference the inet.3 table.  In a real environment this would be done with BGP but ,like we did in the first post in this series, we’re going to leverage static routes to keep things simple.  Let’s use static routes to make our client subnets reachable.  To do this, we simply add the following routes on router 1 and router 4…

At this point if we consult the inet.0 table on router 1 and router 4 we should see that the client subnets exist and that each router is showing a label operation to reach the destination prefix…

If you look at the ingress LDP database on router 1 and router 4 you’ll find that the labels being pushed to reach the client subnets line up with the label the router learned for that prefix or FEC…

And if you check connectivity on each client, you’ll see that they are now able to ping each other successfully so long as they have a route pointing to their directly connected router interface for the remote client network. To validate we’re doing MPLS, let’s start a ping on the top client to the bottom client and do a capture on the link between router 2 and router 3.  Based on the information we know from looking at the LDP database, we can determine what labels we should see in the capture as the traffic traverses the link in between routers 2 and 3.  For the ICMP echo request I’d expect to see a label value of 299808 and for the ICMP echo reply I’d expect to see a value of 299904.  Let’s see if we’re right…

If that didn’t add up to you, look at the LDP database tables on router 2 and 3 again.  The important thing to remember is upstream versus downstream.  In our case, looking between router 2 and 3 the connectivity looks like this…

The blue arrows show the label advertisements and the green arrows show the actual data path label use.  Don’t be put off if this seems overly confusing.  It’s a relatively simple abstraction but one that cant be hard to follow without walking through it step by step.  Fortunately, there’s an easier way to see some of this as well.  What we’ve been looking at so far has mostly been the pieces of control plane that make MPLS work.  The real combination of all these efforts is displayed in the mpls.0 table.  Let’s take a look at that on router 2 so you can see what I mean…

This table describes all label operations that the router will take on labeled packets.  This table confirms what we saw above.  If router 2 receives a frame with label 299888 it will swap the label with 299808 and forward it out of the routers ge-0/0/1.0 interface toward router 3.  This would be the ICMP echo request coming from the top client headed to the bottom client.  On the return, router 2 told router 3 that it needed a label of 299904 in order to reach the downstream DEC of 1.1.1.1/32.  We can see that if router 2 receives a frame with label 299904 it will pop the label and forward the native frame to router 1 (Pen-ultimate hop popping).

This wasn’t meant to be an exhaustive view of LDP but I hope it gave you an idea of what LDP is capable of.  I also hope this demonstrated how necessary a label distribution protocol is to any MPLS network.  Doing static LSPs just isn’t feasible for sort of scalable deployment so you’ll want to make sure you’re comfortable with deploying something like LDP to help out with label assignment and distribution.  In the next post, we’ll layer on BGP so we don’t need to use static routes and talk about how LDP handles failures and reroutes.  Stay tuned!

« Older entries