Fundamentals of MPLS LSPs

      4 Comments on Fundamentals of MPLS LSPs

One of the items that often trips folks up with MPLS is the concept of label switched paths or LSPs. We’ve talked about them extensively before in many of the blog posts here and I’ve described them a couple of different ways. Many people look at an LSP as a sort of unidirectional tunnel. In fact, most network diagrams aiming to describe an LSP often show it just as that – a tunnel. It’s an easy thing to visualize especially when you start talking about nested tunnels or LSPs inside of LSPs, but I also think it can be rather confusing. This becomes even more confusing when people start talking about end to end LSPs or how a service label is the same end to end as traffic traverses an LSP. What does that mean? Where does an LSP start or stop? Is it really a tunnel? How far can an LSP reach? What if we run different label distribution protocols? In this post, and perhaps the next, I hope to address these questions as well as talk about how we can solve some of the common problems that are often encountered with LSPs.

So let’s dive right in and talk about the LSP to tunnel analogy. As we’ve already seen in previous posts, LSPs are not a tunnel in the sense that most networking engineering folks are accustom to. When I think of tunneling, the first thing that comes to my mind is GRE tunnels. In that model, we’re encapsulating IP inside of IP. Router A encapsulates a packet inside another packet and sends it toward the tunnel destination router B. Router B then strips off the outer IP header and now has the original encapsulated packet to work with. In MPLS – it’s the same model except we’re dealing with a label instead of an outer IP packet header. The difference between something like GRE and MPLS is how the transit, or “routers in the middle”, deal with the traffic. In the MPLS world – transit routers only care about labels and label operations. They never inspect the packet to make a decision as to how to forward the tunneled traffic. On the other hand, a GRE transit router still needs to do a route lookup on the outer IP packet to sort out where to send the tunneled traffic. Another interesting difference to think about is that MPLS routers change the label (in most cases) as a means to facilitate moving traffic through the LSP. A GRE transit router doesn’t change the outer, or tunnel, IP packet. It simply performs a lookup and passes the traffic to the next transit router who will also do a lookup on that same IP header. When thinking about it at that level – one could compare an MPLS label to the ethernet headers destination MAC address. A GRE transit router needs to know the MAC address for the next hop it resolves during it’s IP lookup and that MAC address needs to be written into the L2 header so that the next hop router knows to accept the frame. In that case you might compare ARP with a label distribution protocol. In either case, the routers need to exchange information that tells them how to talk to one another and that information is used to facilitate forwarding at the tunnel level. The significant difference between the two methodologies when viewed from this perspective is that an MPLS LSP (tunnel) is precomputed on all routers that take part in the LSP. A GRE router is still performing lookups at each hop along the path to the tunnel destination. This brings us back to one of the initial selling points of label or tag based forwarding which was that the router did not need to do IP lookups at each hop.

So at a high level – I think the comparison between an LSP being a networking “tunnel” is rather apt since it behaves much like any other tunnel we typically see in networking. Put more simply – the mechanism used to move the tunneled traffic from point A to point B never needs to examine the traffic that’s being tunneled. This is one of the huge benefits of MPLS since it allows us to have architectures like BGP free core designs where “routers in the middle” don’t need to know how to forward tunneled traffic. More importantly, MPLS allows us to tack on additional “services” like MPLS VPNs that really make the architecture significantly more useful and scalable than a tunneling technology such as GRE.

My point with this rambling is that tunneling or encapsulation in networking is an analogy that can be looked at lots of different ways. I mean one look at the OSI model and you should see that. Despite this – when I initially started working with MPLS years ago – I had a a hard time picturing an LSP as a tunnel. It just didn’t align with my thinking of what a tunnel was. But once I started hands on working with it, the LSP to tunnel analogy made a lot more sense to me. You’re encapsulating some traffic inside of MPLS labels which allow routers along the way to not have to inspect the encapsulated datagrams. And as we’ll see in this post (or maybe the next) the LSP tunnel analogy is helpful as we try to visualize things like hierarchical LSPs (don’t panic – we haven’t talked about what those are yet).

To help cement the concept of LSPs as tunnels, let’s look at an example lab topology where multiple LSPs come into play…

Note: The lab is the same one we used in the previous BGP-LU posts so if you’ve read those then this should look familiar to you.

Above you can see we have a simple topology. I’m including the above diagram to show you the link local IP addressing and the interface names so you can follow along. In future diagrams, I need to take that stuff out so it isn’t so cluttered. The blue numbers represent the interface numbering. AKA – 0 is the same as ge-0/0/0. Here is what our label distribution protocol use will look like…

The above lab demonstrates an important aspect of MPLS LSPs. Here we have two sites (left and right (I know I’m super creative)) separated by a core. The core in this case is running RSVP as a label distribution protocol whereas the left and right sites are running LDP. In this initial iteration, we’re not running any MPLS services across the core (think MPLS VPNs etc), rather we’re just pursuing the basic MPLS use case of a BGP free core. The intent here is that the routers between the two tail routers have no knowledge of the prefixes being advertised through BGP between the the left and the right sites. Routers 1 and 7 provide connectivity for two clients aptly named left_client and right_client. Before we layer in BGP – let’s look at the base configuration of each of the routers so you can follow along if you like…

vMX1

set system host-name vmx1.lab
set interfaces ge-0/0/0 unit 0 family inet address 10.10.10.1/24
set interfaces ge-0/0/1 unit 0 family inet address 169.254.10.0/31
set interfaces ge-0/0/1 unit 0 family mpls
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 1.1.1.1/32 primary
set routing-options router-id 1.1.1.1
set protocols ldp interface ge-0/0/1.0
set protocols mpls interface ge-0/0/1.0
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p

vMX2

set system host-name vmx2.lab
set interfaces ge-0/0/0 unit 0 family inet address 169.254.10.1/31
set interfaces ge-0/0/0 unit 0 family mpls
set interfaces ge-0/0/1 unit 0 family inet address 169.254.10.2/31
set interfaces ge-0/0/1 unit 0 family mpls
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 2.2.2.2/32 primary
set routing-options router-id 2.2.2.2
set protocols ldp interface ge-0/0/0.0
set protocols ldp interface ge-0/0/1.0
set protocols mpls interface ge-0/0/1.0
set protocols mpls interface ge-0/0/0.0
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p

vMX3

set system host-name vmx3.lab
set interfaces ge-0/0/0 unit 0 family inet address 169.254.10.3/31
set interfaces ge-0/0/0 unit 0 family mpls
set interfaces ge-0/0/1 unit 0 family inet address 169.254.10.4/31
set interfaces ge-0/0/1 unit 0 family mpls
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 3.3.3.3/32 primary
set protocols rsvp interface ge-0/0/1.0
set protocols ldp interface ge-0/0/0.0 
set protocols mpls label-switched-path to-vmx5 to 5.5.5.5
set protocols mpls interface ge-0/0/1.0
set protocols mpls interface ge-0/0/0.0
set protocols ospf traffic-engineering
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p

vMX4

set system host-name vmx4.lab
set interfaces ge-0/0/0 unit 0 family inet address 169.254.10.5/31
set interfaces ge-0/0/0 unit 0 family mpls
set interfaces ge-0/0/1 unit 0 family inet address 169.254.10.6/31
set interfaces ge-0/0/1 unit 0 family mpls
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 4.4.4.4/32 primary
set routing-options router-id 4.4.4.4
set protocols rsvp interface ge-0/0/1.0
set protocols rsvp interface ge-0/0/0.0
set protocols mpls interface ge-0/0/1.0
set protocols mpls interface ge-0/0/0.0
set protocols ospf traffic-engineering
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p

vMX5

set system host-name vmx5.lab
set interfaces ge-0/0/0 unit 0 family inet address 169.254.10.7/31
set interfaces ge-0/0/0 unit 0 family mpls
set interfaces ge-0/0/1 unit 0 family inet address 169.254.10.8/31
set interfaces ge-0/0/1 unit 0 family mpls
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 5.5.5.5/32 primary
set routing-options router-id 5.5.5.5
set protocols rsvp interface ge-0/0/0.0
set protocols mpls label-switched-path to-vmx3 to 3.3.3.3
set protocols mpls interface ge-0/0/1.0
set protocols mpls interface ge-0/0/0.0
set protocols ospf traffic-engineering
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ldp interface ge-0/0/1.0

vMX6

set system host-name vmx6.lab
set interfaces ge-0/0/0 unit 0 family inet address 169.254.10.9/31
set interfaces ge-0/0/0 unit 0 family mpls
set interfaces ge-0/0/1 unit 0 family inet address 169.254.10.10/31
set interfaces ge-0/0/1 unit 0 family mpls
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 6.6.6.6/32 primary
set routing-options router-id 6.6.6.6
set protocols mpls interface ge-0/0/1.0
set protocols mpls interface ge-0/0/0.0
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p
set protocols ldp interface ge-0/0/0.0
set protocols ldp interface ge-0/0/1.0

vMX7

set system host-name vmx7.lab
set interfaces ge-0/0/0 unit 0 family inet address 169.254.10.11/31
set interfaces ge-0/0/0 unit 0 family mpls
set interfaces ge-0/0/1 unit 0 family inet address 192.168.10.1/24
set interfaces lo0 unit 0 family inet no-redirects
set interfaces lo0 unit 0 family inet address 7.7.7.7/32 primary
set routing-options static route 10.171.200.0/22 next-hop 192.168.127.100
set routing-options router-id 7.7.7.7
set protocols mpls interface ge-0/0/0.0
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ldp interface ge-0/0/0.0

As you can see – this is all pretty standard configuration. I’ve enabled MPLS on all of the interfaces where we expect to see labels and RSVP or LDP on the interfaces that require it. On vMX3 and vMX4 I created RSVP LSPs that simply reference the other IP addresses loopback. You’ll also notice that all the routers are part of one big OSPF area. That’s me cheating, typically you wouldn’t have the same IGP area in both the core and the remote sites but for the sake of this discussion, it doesn’t matter.

Now that we have our base configuration done, let’s look at configuring BGP. If our goal was simply to advertise the client prefixes from vMX1 to vMX7 – you’d think that a direct peering between the two would be sufficient. So let’s imagine that vMX1 is in AS 65001 and vMX2 is in AS 65002 and try that out…

vMX1

set routing-options autonomous-system 65001 
set protocols bgp group external peer-as 65002 
set protocols bgp group external neighbor 7.7.7.7 family inet unicast               
set protocols bgp group external multihop ttl 255    
set protocols bgp group external type external      
set protocols bgp group external local-address 1.1.1.1 
set policy-options policy-statement bgp-export term direct from protocol direct 
set policy-options policy-statement bgp-export term direct from prefix-list bgp-export           
set policy-options policy-statement bgp-export term direct then accept                    
set protocols bgp group external export bgp-export                                        
set policy-options prefix-list bgp-export 10.10.10.0/24 

vMX7

set routing-options autonomous-system 65002
set protocols bgp group external peer-as 65001 
set protocols bgp group external neighbor 1.1.1.1 family inet unicast               
set protocols bgp group external multihop ttl 255    
set protocols bgp group external type external  
set protocols bgp group external local-address 7.7.7.7      
set policy-options policy-statement bgp-export term direct from protocol direct 
set policy-options policy-statement bgp-export term direct from prefix-list bgp-export           
set policy-options policy-statement bgp-export term direct then accept                    
set protocols bgp group external export bgp-export                                        
set policy-options prefix-list bgp-export 192.168.10.0/24

So nothing too fancy here. The only weird part is that I’m doing an eBGP peering to a loopback so I need to increase the TTL for multihop but the rest seems pretty straight forward. Once that config is committed we should see our peerings come up and we should get the routes from the other side…

[email protected]> show route table inet.0 192.168.10.0/24 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.10.0/24    *[BGP/170] 00:03:39, localpref 100, from 7.7.7.7
                      AS path: 65002 I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0

[email protected]> 
[email protected]> show route table inet.0 10.10.10.0/24 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[BGP/170] 00:04:00, localpref 100, from 1.1.1.1
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.10 via ge-0/0/0.0

[email protected]> 

All looks good. So can the clients ping each other?

left_client:~# ping 192.168.10.100 -c 5
PING 192.168.10.100 (192.168.10.100) 56(84) bytes of data.

--- 192.168.10.100 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

left_client:~# 

Negative. Any guesses as to why? Let’s try and track this down…

[email protected]> show route table inet.0 192.168.10.100 extensive 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
192.168.10.0/24 (1 entry, 1 announced)
TSI:
KRT in-kernel 192.168.10.0/24 -> {indirect(1048574)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd0078d0
                Next-hop reference count: 2
                Source: 7.7.7.7
                Next hop type: Router, Next hop index: 583
                Next hop: 169.254.10.1 via ge-0/0/1.0, selected
                Session Id: 0x140
                Protocol next hop: 7.7.7.7
                Indirect next hop: 0xb68cec0 1048574 INH Session ID: 0x142
                State: <Active Ext>
                Local AS: 65001 Peer AS: 65002
                Age: 9 	Metric2: 6 
                Validation State: unverified 
                Task: BGP_65002.7.7.7.7+179
                Announcement bits (2): 0-KRT 6-Resolve tree 2 
                AS path: 65002 I 
                Accepted
                Localpref: 100
                Router ID: 7.7.7.7
                Indirect next hops: 1
                        Protocol next hop: 7.7.7.7 Metric: 6
                        Indirect next hop: 0xb68cec0 1048574 INH Session ID: 0x142
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.1 via ge-0/0/1.0
                                Session Id: 0x140
			7.7.7.7/32 Originating RIB: inet.0
			  Metric: 6			  Node path count: 1
			  Forwarding nexthops: 1
				Nexthop: 169.254.10.1 via ge-0/0/1.0
				Session Id: 140

[email protected]> 

If we look at the route we’re getting for vMX7 – we’ll see that it’s being advertised as having a protocol next hop of 7.7.7.7. This is to be expected since that’s the router-ID of vMX7. So let’s look and see what the router wants to do with this route when we try to talk to right_client….

[email protected]> show route forwarding-table destination 192.168.10.100 
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
192.168.10.0/24    user     0                    indr  1048574     2
                              169.254.10.1       ucst      583    19 ge-0/0/1.0

Routing table: __pfe_private__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      514     2

Routing table: __juniper_services__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      527     2

Routing table: __master.anon__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    rjct      542     1

[email protected]> 

Notice the above highlighted lines. vMX1 thinks it should forward the traffic over to 169.254.10.1 which is the ge-0/0/0 interface of vMX2. This doesn’t seem like a problem since that’s the direction we need to go. So now let’s go checkout out what vMX2 will do with that traffic…

[email protected]> show route forwarding-table destination 192.168.10.100 
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    rjct       36     1

Routing table: __juniper_services__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      514     2

Routing table: __pfe_private__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      527     2

Routing table: __master.anon__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    rjct      542     1

[email protected]> 

Huh. So vMX2 wants nothing to do with that traffic is is going to drop it. So why is that? Well the reason is that vMX2 doesn’t know about the route to 192.168.10.0/24 since that’s only being shared between the 2 eBGP peers vMX1 and vMX7…

[email protected]> show route table inet.0 192.168.10.0/24 

[email protected]> 

This is one of the reason we wanted to run MPLS – so that every router doesn’t need to know about all the BGP prefixes. So this is actually the behavior we expect to see. In order for this to work, traffic toward 192.168.10.100 needs to enter an MPLS LSP on vMX1 and for some reason that isn’t happening.

The reason is simple – MPLS LSPs need to be end to end. As this design stands, we have 3 distinct label domains. The left site is only aware of the routers in it’s LDP domain, the core is only aware of routers speaking RSVP, and the right site is only aware of routers in it’s LDP domain. We know that for LDP and RSVP to get into an LSP we need to have an entry in the routers inet.3 table. Let’s see what vMX1 knows about…

[email protected]> show route table inet.3 

inet.3: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[LDP/9] 00:53:22, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0
3.3.3.3/32         *[LDP/9] 00:53:20, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 299792

[email protected]> 

As you can see vMX1 is only aware of vMX2 and vMX3. If we look at vMX3 we’ll see that it’s aware of it’s LDP peers (vMX1 and vMX2) as well as it’s RSVP peer (vMX5) in the inet.3 table. In other words – these are the only routers that vMX3 knows how to get to through an LSP…

[email protected]> show route table inet.3 

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[LDP/9] 00:11:27, metric 1
                    > to 169.254.10.2 via ge-0/0/0.0, Push 299776
2.2.2.2/32         *[LDP/9] 00:11:27, metric 1
                    > to 169.254.10.2 via ge-0/0/0.0
5.5.5.5/32         *[RSVP/7/1] 00:53:08, metric 2
                    > to 169.254.10.5 via ge-0/0/1.0, label-switched-path to-vmx5

[email protected]> 

So the problem is that vMX1 doesn’t know how to get traffic to 192.168.10.100 into an LSP since it’s not aware of an LSP towards 7.7.7.7. If we were to quickly enable LDP on the interfaces where we’re running RSVP currently we’d see that our ping would start working. In fact, let’s do that so you can see…

vMX3

set protocols ldp interface ge-0/0/1 

vMX4

set protocols ldp interface ge-0/0/0
set protocols ldp interface ge-0/0/1 

vMX5

set protocols ldp interface ge-0/0/0

Now our ping should start working…

root@left_client:~# ping 192.168.10.100 -c 5
64 bytes from 192.168.10.100: icmp_seq=1 ttl=57 time=3.86 ms
64 bytes from 192.168.10.100: icmp_seq=2 ttl=57 time=4.63 ms
64 bytes from 192.168.10.100: icmp_seq=3 ttl=57 time=4.12 ms
64 bytes from 192.168.10.100: icmp_seq=4 ttl=57 time=6.33 ms
64 bytes from 192.168.10.100: icmp_seq=5 ttl=57 time=4.68 ms
root@left_client:~# 

And if we look at the inet.3 table on vMX1 again…

[email protected]> show route table inet.3    

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[LDP/9] 00:57:17, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0
3.3.3.3/32         *[LDP/9] 00:57:15, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 299792
4.4.4.4/32         *[LDP/9] 00:01:19, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 299872
5.5.5.5/32         *[LDP/9] 00:00:50, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 299888
6.6.6.6/32         *[LDP/9] 00:00:50, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 299904
7.7.7.7/32         *[LDP/9] 00:00:50, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 299920

[email protected]> 

Also notice that our forwarding table now shows a push action…

[email protected]> show route forwarding-table destination 192.168.10.100    
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
192.168.10.0/24    user     0                    indr  1048574     2
                              169.254.10.1      Push 299920      593     2 ge-0/0/1.0

Routing table: __pfe_private__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      514     2

Routing table: __juniper_services__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      527     2

Routing table: __master.anon__.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    rjct      542     1

[email protected]> 

So let’s rollback that config on vMX3, vMX4, and vMX5 (rollback 1) and talk about how to solve this. It’s obvious to me at this point that we have something that looks like this….

Within each label domain, we have a path from edge to edge. The problem is – we don’t have an end to end label path from vMX1 to vMX7. Without that – we’d be asking each border router (in this case vMX3 and vMX5) to have the traffic leave one LSP and re-enter another. That is client_left traffic would need to enter an LSP on vMX1 and head as far as it can to vMX3 which is the LDP LSP endpoint. At vMX3 some type of lookup would need to take place that would tell the traffic to enter into the RSVP LSP to vMX5 with a similar lookup having to occur at vMX5 to reach vMX7. I like to think of MPLS forwarding paths as having to be completely pre-determined. Having to leave an LSP, do a lookup for the next hop, and then enter another LSP, is not pre-determined. This is one of the bigger misconceptions I find when talking with folks about MPLS. There is a general assumption that because there exists a path of LSPs that this is the same as having an end to end LSP.

So how do we fix this? Well – there are a couple of solutions here. The one I want to focus on in this post requires the use of BGP-LU or BGP Labelled Unicast. If you’re unfamiliar with the labelled unicast address family have a quick read here. If you don’t want to do that – I can boil it down to saying that it causes BGP to allocate a label for each prefix it advertises. So why is this helpful? Well to create an end to end LSP we need some mechanism on our border routers (vMX3 and vMX5) to keep the LSP alive. That is – we need to have a label operation it can use to stitch together our discrete LSPs. Let’s configure this and I’ll talk more about it.

To configure BGP labelled unicast, we’re going to redo our BGP peering to look more like this…

You might be asking if we need all these peering sessions for this to work and the short answer is yes – we’ll see why shortly. So let’s drop these configurations on each router…

vMX1

delete protocols bgp
set protocols bgp group internal peer-as 65001
set protocols bgp group internal neighbor 3.3.3.3 family inet labeled-unicast
set protocols bgp group internal local-address 1.1.1.1
set protocols bgp group internal type internal
set protocols bgp group internal export bgp-export 
set policy-options policy-statement bgp-export term direct from protocol direct 
set policy-options policy-statement bgp-export term direct from prefix-list bgp-export           
set policy-options policy-statement bgp-export term direct then accept                                                      
set policy-options prefix-list bgp-export 10.10.10.0/24 

vMX3

set routing-options autonomous-system 65001
set protocols bgp group internal peer-as 65001
set protocols bgp group internal neighbor 1.1.1.1 family inet labeled-unicast
set protocols bgp group internal local-address 3.3.3.3
set protocols bgp group internal type internal
set protocols bgp group external type external
set protocols bgp group external multihop ttl 255
set protocols bgp group external local-address 3.3.3.3
set protocols bgp group external peer-as 65002
set protocols bgp group external neighbor 5.5.5.5 family inet labeled-unicast

vMX5

set routing-options autonomous-system 65002
set protocols bgp group internal peer-as 65002
set protocols bgp group internal neighbor 7.7.7.7 family inet labeled-unicast
set protocols bgp group internal local-address 5.5.5.5
set protocols bgp group internal type internal
set protocols bgp group external type external
set protocols bgp group external multihop ttl 255
set protocols bgp group external local-address 5.5.5.5
set protocols bgp group external peer-as 65001
set protocols bgp group external neighbor 3.3.3.3 family inet labeled-unicast

vMX7

delete protocols bgp
set protocols bgp group internal peer-as 65002
set protocols bgp group internal neighbor 5.5.5.5 family inet labeled-unicast
set protocols bgp group internal local-address 7.7.7.7
set protocols bgp group internal type internal
set protocols bgp group internal export bgp-export 
set policy-options prefix-list customer1 192.168.10.0/24 
set policy-options policy-statement bgp-export term direct from protocol direct 
set policy-options policy-statement bgp-export term direct from prefix-list bgp-export           
set policy-options policy-statement bgp-export term direct then accept                                                           
set policy-options prefix-list bgp-export 192.168.10.0/24

Once this configuration is in place, you should be able to validate that all of the BGP peerings are up on each of the 4 routers. If they aren’t – double check your configuration. Once up – you should notice that your ping from client_left to client_right is now once again working!

left_client:~# ping 192.168.10.100 -c 5
64 bytes from 192.168.10.100: icmp_seq=1 ttl=57 time=3.86 ms
64 bytes from 192.168.10.100: icmp_seq=2 ttl=57 time=4.63 ms
64 bytes from 192.168.10.100: icmp_seq=3 ttl=57 time=4.12 ms
64 bytes from 192.168.10.100: icmp_seq=4 ttl=57 time=6.33 ms
64 bytes from 192.168.10.100: icmp_seq=5 ttl=57 time=4.68 ms
left_client:~# 

So what magic is happening to allow this to work? They key to understanding what’s happening is what I mentioned in my post on BGP-LU…

Anytime a router changes the next-hop of a prefix it’s advertising – it must allocate a new label.

And when does BGP router change the next hop of the route it’s advertising? Well – by default that happens over a eBGP peering. But in the case of BGP-LU it happens more frequently…

When the labeled-unicast statement is used, the local router automatically performs a next hop to self on all routes advertised into EBGP from IBGP and into IBGP from EBGP

So in our case – each of the BGP enabled routers will allocate a new label. Let’s dig in and make sure that’s what’s is happening. Let’s start at vMX1 and look to see what it thinks it should be doing to reach 192.168.10.100

[email protected]> show route table inet.0 192.168.10.100 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.10.0/24    *[BGP/170] 00:09:48, localpref 100, from 3.3.3.3
                      AS path: 65002 I, validation-state: unverified
                    > to 169.254.10.1 via ge-0/0/1.0, Push 301296, Push 301152(top)

[email protected]> 

Interesting. So not only do we now have an MPLS operation, we have two of them! So where are these labels coming from? Let’s look at little deeper to see what we can find…

[email protected]> show ldp database 
Input label database, 1.1.1.1:0--2.2.2.2:0
Labels received: 3
  Label     Prefix
 301136      1.1.1.1/32
      3      2.2.2.2/32
 301152      3.3.3.3/32

Output label database, 1.1.1.1:0--2.2.2.2:0
Labels advertised: 3
  Label     Prefix
      3      1.1.1.1/32
 299856      2.2.2.2/32
 299872      3.3.3.3/32

[email protected]> 

Ok – so our top label of 301152 is something we learned through LDP. vMX2 is telling us that if we want to get to 3.3.3.3 we should send it a label of 299792. Ok – so now where does label 301296 come from?..

[email protected]> show route receive-protocol bgp 3.3.3.3 table inet.0 extensive 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
* 192.168.10.0/24 (1 entry, 1 announced)
     Accepted
     Route Label: 301296
     Nexthop: 3.3.3.3
     Localpref: 100
     AS path: 65002 I 
     Entropy label capable, next hop field matches route next hop

[email protected]> 

It’s coming from BGP! So vMX1 thinks to get to 192.168.10.100 it should push a label of 301296 followed by 301152. If you’re confused as to what’s going on just stay with us for the moment. Let’s track this down and figure out what’s going on. Let’s see what vMX2 does with label 299792

[email protected]> show route table mpls.0 label 301152 

mpls.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301152             *[LDP/9] 23:10:09, metric 1
                    > to 169.254.10.3 via ge-0/0/1.0, Pop      
301152(S=0)        *[LDP/9] 23:10:09, metric 1
                    > to 169.254.10.3 via ge-0/0/1.0, Pop      

[email protected]> 

Alright – so it’s going to pop the label and send it out of ge-0/0/1 toward vMX3. This makes sense since we know that vMX1 was trying to send traffic to the end of the local LSP which is vMX3. So that means vMX3 will get a frame with a single MPLS label of 301296. What will it do with it?..

[email protected]> show route table mpls.0 label 301296 

mpls.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301296             *[VPN/170] 00:12:01, metric2 2, from 5.5.5.5
                    > to 169.254.10.5 via ge-0/0/1.0, label-switched-path to-vmx5

[email protected]> 

It’s going to push it right into the RSVP LSP toward vMX5. Brilliant! So it appears that the use of BGP LU is successfully stitching our LSPs together. But let’s back up a second, how is it actually doing this? Let’s look at vMX1 again…

[email protected]> show route table inet.3 

inet.3: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[LDP/9] 23:11:06, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0
3.3.3.3/32         *[LDP/9] 23:10:59, metric 1
                    > to 169.254.10.1 via ge-0/0/1.0, Push 301152

[email protected]> 

Notice that we still don’t know about the 7.7.7.7 endpoint so there’s no way we can create an LSP directly to it. Let’s look at the route we’re getting through BGP in more detail…

[email protected]> show route table inet.0 192.168.10.0/24 extensive 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
192.168.10.0/24 (1 entry, 1 announced)
TSI:
KRT in-kernel 192.168.10.0/24 -> {indirect(1048574)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd007f30
                Next-hop reference count: 2
                Source: 3.3.3.3
                Next hop type: Router, Next hop index: 589
                Next hop: 169.254.10.1 via ge-0/0/1.0, selected
                Label operation: Push 301296, Push 301152(top)
                Label TTL action: prop-ttl, prop-ttl(top)
                Load balance label: Label 301296: None; Label 301152: None; 
                Label element ptr: 0xd007800
                Label parent element ptr: 0xd006de0
                Label element references: 1
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x165
                Protocol next hop: 3.3.3.3
                Label operation: Push 301296
                Label TTL action: prop-ttl
                Load balance label: Label 301296: None; 
                Indirect next hop: 0xb68d630 1048574 INH Session ID: 0x167
                State: <Active Int Ext>
                Local AS: 65001 Peer AS: 65001
                Age: 12:40 	Metric2: 1 
                Validation State: unverified 
                Task: BGP_65001.3.3.3.3+50896
                Announcement bits (2): 0-KRT 6-Resolve tree 2 
                AS path: 65002 I 
                Accepted
                Route Label: 301296
                Localpref: 100
                Router ID: 3.3.3.3
                Indirect next hops: 1
                        Protocol next hop: 3.3.3.3 Metric: 1
                        Label operation: Push 301296
                        Label TTL action: prop-ttl
                        Load balance label: Label 301296: None; 
                        Indirect next hop: 0xb68d630 1048574 INH Session ID: 0x167
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.1 via ge-0/0/1.0
                                Session Id: 0x165
			3.3.3.3/32 Originating RIB: inet.3
			  Metric: 1			  Node path count: 1
			  Forwarding nexthops: 1
				Nexthop: 169.254.10.1 via ge-0/0/1.0
				Session Id: 0
                                        
[email protected]> 

Ah ha. So vMX thinks that route is coming from vMX3. And fortunately for us, vMX1 does know how to get to vMX3 – through the LDP LSP. And since we’re getting a route label through the BGP LU address family, vMX1 also knows that if it wants to reach that destination, it should first push the route label onto the frame before putting it in the LDP LSP. Let’s finish walking through what’s happening from vMX3 to vMX7 and then put it all together in a neat little diagram.

So we saw that vMX3 was going to stitch our LDP LSP into our RSVP LSP above. Let’s look at that again with a little more output…

[email protected]> show route table mpls.0 label 301296                

mpls.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301296             *[VPN/170] 00:22:59, metric2 2, from 5.5.5.5
                    > to 169.254.10.5 via ge-0/0/1.0, label-switched-path to-vmx5

[email protected]> show rsvp session lsp name to-vmx5                  
Ingress RSVP: 1 sessions
To              From            State   Rt Style Labelin Labelout LSPname 
5.5.5.5         3.3.3.3         Up       0  1 FF       -   301120 to-vmx5
Total 1 displayed, Up 1, Down 0

Egress RSVP: 1 sessions
Total 0 displayed, Up 0, Down 0

Transit RSVP: 0 sessions
Total 0 displayed, Up 0, Down 0

[email protected]> show route forwarding-table label 301296            
Routing table: default.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
301296             user     0                    indr  1048577     2
                              169.254.10.5      Swap 301248, Push 301120(top)      590     2 ge-0/0/1.0

Routing table: __mpls-oam__.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      553     1

[email protected]> 

If we look at like 7 above – you see that when vMX3 receives label 301296 from vMX2, it’s going to push it into the RSVP LSP that we created to vMX5. If we look at that RSVP LSP we’ll see that on line 12 it will push label 301120 to get into that LSP. But when we look at what’s going on in the actual forwarding plane, we’ll see on line 26 that it want’s to swap the current top label to 301248 and then push 301120 to the top of the label stack making it the new top label. So we know where 301120 came from, but where did the 301248 come from? As you might have guessed – it comes from BGP….

[email protected]> show route receive-protocol bgp 5.5.5.5 extensive 

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
* 192.168.10.0/24 (1 entry, 1 announced)
     Accepted
     Route Label: 301248
     Nexthop: 5.5.5.5
     AS path: 65002 I 
     Entropy label capable, next hop field matches route next hop

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)

mpls.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)

inet6.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

[email protected]> 

It comes from vMX5! Alright – so now let’s look at what vMX4 does with that top label…

[email protected]> show route table mpls.0 label 301120 

mpls.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301120             *[RSVP/7/1] 23:32:14, metric 1
                    > to 169.254.10.7 via ge-0/0/1.0, label-switched-path to-vmx5
301120(S=0)        *[RSVP/7/1] 23:32:14, metric 1
                    > to 169.254.10.7 via ge-0/0/1.0, label-switched-path to-vmx5

[email protected]> show route forwarding-table label 301120 
Routing table: default.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
301120             user     0 169.254.10.7      Pop        583     2 ge-0/0/1.0
301120(S=0)        user     0 169.254.10.7      Pop        584     2 ge-0/0/1.0

Routing table: __mpls-oam__.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      540     1

[email protected]> 

So we’re still in the RSVP LSP – but since vMX4 is the PHP router we’ll be popping off the label which means that vMX5 is going to receive a frame with the label 301248. vMX5 will then….

[email protected]> show route table mpls.0 label 301248 

mpls.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301248             *[VPN/170] 00:35:35, metric2 1, from 7.7.7.7
                    > to 169.254.10.9 via ge-0/0/1.0, Swap 301216

[email protected]> 

Notice how vMX5 is only now performing a swap operation to label 301216. AKA – it will be sending vMX6 a labelled frame with a single label. Since vMX5 will be pushing us into our tail LSP there’s no need for a second label for further stitching. Let’s finish this up by seeing what vMX6 will do….

[email protected]> show route table mpls.0 label 301216 

mpls.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301216             *[LDP/9] 23:35:20, metric 1
                    > to 169.254.10.11 via ge-0/0/1.0, Pop      
301216(S=0)        *[LDP/9] 23:35:20, metric 1
                    > to 169.254.10.11 via ge-0/0/1.0, Pop      

[email protected]> 

vMX6 is the PHP router so it will pop the label and deliver the the naked frame to vMX7 which has the destination network 192.168.10.0/24 directly attached. Looking at this from a visual perspective, it would look like this…

Now – keep in mind that what Im depicting above is the actual datapath. The arrows above indicate where the label being sent is acted upon or used. For instance, the LDP labels are used at each hop within the LDP domain whereas the BGP-LU is used and advertised across a label domain and is only used in the stitching from label domain to label domain.

Now – let’s finish up the post by making this a hair more complicated and throwing VPNv4 into the mix. So as we did in our last post, let’s convert the interfaces facing our clients to VRF type routing-instances…

vMX1

set routing-instances customer1 instance-type vrf
set routing-instances customer1 interface ge-0/0/0.0
set routing-instances customer1 route-distinguisher 1:1
set routing-instances customer1 vrf-target target:1:1
set routing-instances customer1 vrf-table-label 

vMX7

set routing-instances customer1 instance-type vrf
set routing-instances customer1 interface ge-0/0/1.0
set routing-instances customer1 route-distinguisher 1:1
set routing-instances customer1 vrf-target target:1:1
set routing-instances customer1 vrf-table-label 

Now we need to setup a VPNv4 peering to allow the advertisements for each VRF to be exchanged. There are a couple of ways we could do this, but for the sake of reusing what we already have let’s just add the inet-vpn address family to the existing BGP sessions…

vMX1

set protocols bgp group internal neighbor 3.3.3.3 family inet-vpn unicast 

vMX3

set protocols bgp group internal neighbor 1.1.1.1 family inet-vpn unicast    
set protocols bgp group external neighbor 5.5.5.5 family inet-vpn unicast  

vMX5

set protocols bgp group internal neighbor 7.7.7.7 family inet-vpn unicast                                       
set protocols bgp group external neighbor 3.3.3.3 family inet-vpn unicast 

vMX7

set protocols bgp group internal neighbor 5.5.5.5 family inet-vpn unicast 

Alright – so once that’s in – let’s make sure that the peering is up and that we’re exchanging VPNv4 routes…

[email protected]> show bgp summary 
Groups: 1 Peers: 1 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0               
                       0          0          0          0          0          0
bgp.l3vpn.0          
                       1          1          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
5.5.5.5               65002          5          5       0       0          55 Establ
  inet.0: 0/0/0/0
  bgp.l3vpn.0: 1/1/1/0
  customer1.inet.0: 1/1/1/0

[email protected]> show route table customer1.inet.0       

customer1.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[BGP/170] 00:00:45, localpref 100, from 5.5.5.5
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.10 via ge-0/0/0.0, Push 302336, Push 301376(top)
192.168.10.0/24    *[Direct/0] 00:03:55
                    > via ge-0/0/1.0
192.168.10.1/32    *[Local/0] 00:03:55
                      Local via ge-0/0/1.0

[email protected]> 

Nice! The route for the left_client has arrived and it appears to have been imported correctly. If we do a quick ping test, we should see that the clients have connectivity as well…

root@left_client:~# ping 192.168.10.100 -c 5
PING 192.168.10.100 (192.168.10.100) 56(84) bytes of data.
64 bytes from 192.168.10.100: icmp_seq=1 ttl=57 time=4.55 ms
64 bytes from 192.168.10.100: icmp_seq=2 ttl=57 time=5.11 ms
64 bytes from 192.168.10.100: icmp_seq=3 ttl=57 time=4.88 ms
64 bytes from 192.168.10.100: icmp_seq=4 ttl=57 time=5.55 ms
64 bytes from 192.168.10.100: icmp_seq=5 ttl=57 time=3.95 ms
 
--- 192.168.10.100 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 3.955/4.812/5.553/0.539 ms
root@left_client:~# 

But hold on a minute. That seemed too easy – and if we look carefully at the route output – we should notice something a little off. Recall that before we layered in the VPNv4 configuration, the tail routers had to push 2 labels to get traffic from end to end. One label was a local LDP label and the other was the BGP-LU label that was used to stitch the traffic into the next label domain. From the looks of the output above, we’re still only pushing 2 labels. So where’s our VPN label?

If you picked up on that kudos. And if you know why we’re still only sending two labels – I’ll give you even more kudos. But before I talk through that, let’s try an experiment…

vMX1

delete protocols bgp group internal neighbor 3.3.3.3 family inet labeled-unicast 

vMX3

delete protocols bgp group internal neighbor 1.1.1.1 family inet labeled-unicast 
delete protocols bgp group external neighbor 5.5.5.5 family inet labeled-unicast 

vMX5

delete protocols bgp group internal neighbor 7.7.7.7 family inet labeled-unicast 
delete protocols bgp group external neighbor 3.3.3.3 family inet labeled-unicast 

vMX7

delete protocols bgp group internal neighbor 5.5.5.5 family inet labeled-unicast 

You may be surprised to see that after the BGP sessions re-peer that our traffic is once again working. If you’re entirely confused as to what’s going on – that’s totally fair at this point. What’s happened here is that we’ve stumbled upon yet another MPLS VPN architecture that falls into the “MPLS Inter-AS VPN” category of architectures. You can read more about that here but they are typically referred to as either option A, B, or C. Suffice to say – we just haven’t gotten there yet – we’re still in the process of building the blocks required for those to make sense. So at this point, I don’t want to pursue talking through what’s going on (which happens to be option B) but I do want to finish making this topology work using our existing LSP stitching. This will look a lot like a Inter-AS option C setup but for now don’t worry about what the setup looks like. We’re just going to focus on making this work and in the coming posts we’ll talk through the inter-as options in much greater detail.

So we know that we want to use LSP stitching (BGP-LU) for our VPNv4 setup so let’s go ahead and put BGP-LU back in (just do a rollback 1 on vMX1, 3, 5, 7). Once that’s done let’s figure out why this isn’t working the way we thought it would.

Let’s think back to the MPLS VPN scenarios we’ve discussed previously. Do you recall what the requirement was for those to work? Well – typically we received a VPNv4 advertisement and the next-hop for that prefix was the remote PE router. This was important because that meant that the VPN label we received as part of that advertisement was used end to end. That is – it was at the bottom of the label stack and delivered as the top label to the PE that advertised it. Let’s see what we have now…

[email protected]> show route receive-protocol bgp 5.5.5.5 extensive table bgp.l3vpn.0 

bgp.l3vpn.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
* 1:1:10.10.10.0/24 (1 entry, 0 announced)
     Import Accepted
     Route Distinguisher: 1:1
     VPN Label: 302400
     Nexthop: 5.5.5.5
     Localpref: 100
     AS path: 65001 I 
     Communities: target:1:1

[email protected]> 

Well – I can tell you right now that doesn’t look right. The next hop is 5.5.5.5 instead of 1.1.1.1. Let’s see if the VPN label lines up with what vMX1 is sending….

[email protected]> show route advertising-protocol bgp 3.3.3.3 extensive 

customer1.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
* 10.10.10.0/24 (1 entry, 1 announced)
 BGP group internal type Internal
     Route Distinguisher: 1:1
     VPN Label: 16
     Nexthop: Self
     Flags: Nexthop Change
     Localpref: 100
     AS path: [65001] I 
     Communities: target:1:1

[email protected]> 

Nope. vMX1 is sending a VPN label of 16. So clearly – this isn’t working like we think it should. So it looks to me like vMX7 believes the route has a next-hop of 5.5.5.5 which is vMX5. This is default behavior in this case based on how we have the BGP peering configured. But we really want that next-hop to show up as vMX1. To do this – we can tell the border routers (vMX3 and vMX5) to not change the next hop of the routes they are advertising. Let’s try doing that…

vMX3

set protocols bgp group external neighbor 5.5.5.5 multihop no-nexthop-change 

vMX5

set protocols bgp group external neighbor 3.3.3.3 multihop no-nexthop-change 

You’ll notice that it’s a peer level change so it’s safe to assume that this impacts all of the route advertisements from that peer. If we look now – we’ll see that we’ve lost our remote routes in each customer VRF…

[email protected]> show route table customer1.inet.0  

customer1.inet.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.10.0/24    *[Direct/0] 00:27:06
                    > via ge-0/0/1.0
192.168.10.1/32    *[Local/0] 00:27:06
                      Local via ge-0/0/1.0

[email protected]> 

So let’s think about this for a second. In the first example we did above before we layered on VPNv4 – what prefix were we advertising from the tail routers? Since it was non-VPN we were simply advertising the prefix we wished to reach. But in a VPN scenario, what prefix do we need to reach and furthermore recursively resolve through the inet.3 table? The loopback of the remote PE! So we need to update our bgp-export policy to remove the client subnet and add the PE loopback address. Let’s try that out on vMX1 and see what happens…

vMX1

delete policy-options prefix-list bgp-export 10.10.10.0/24 
set policy-options prefix-list bgp-export 1.1.1.1/32 

So that prefix should be coming in the form of a BGP-LU advertisement into vMX3. Let’s check…

[email protected]> show route receive-protocol bgp 1.1.1.1 table inet.0 extensive 

inet.0: 20 destinations, 21 routes (20 active, 0 holddown, 0 hidden)
  1.1.1.1/32 (2 entries, 1 announced)
     Accepted
     Route Label: 3
     Nexthop: 1.1.1.1
     Localpref: 100
     AS path: I 
     Entropy label capable, next hop field matches route next hop

[email protected]> 

Perfect – now let’s make sure it’s being sent to vMX5….

[email protected]> show route advertising-protocol bgp 5.5.5.5 table inet.0 

[email protected]> 

Hmmmm. Any guesses? The problem is that vMX3 already has a better router for 1.1.1.1//32

[email protected]> show route table inet.0 1.1.1.1/32 

inet.0: 20 destinations, 21 routes (20 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[OSPF/10] 00:34:32, metric 2
                    > to 169.254.10.2 via ge-0/0/0.0
                    [BGP/170] 00:05:59, localpref 100, from 1.1.1.1
                      AS path: I, validation-state: unverified
                    > to 169.254.10.2 via ge-0/0/0.0, Push 301328

[email protected]> 

It knows about it through OSPF. So huh. We can’t take it out of OSPF otherwise we could peer within our AS. So what’s the fix? Typically, in this type of scenario, the loopback is readvertised into BGP as the AS border router. That is, we let vMX3 and vMX5 advertise the loopback into BGP since they already have it in OSPF. So let’s delete the entry we made on vMX1…

vMX1

delete policy-options prefix-list bgp-export 1.1.1.1/32 

And instead put this policy on vMX3 and vMX5…

vMX3

set policy-options prefix-list bgp-export 1.1.1.1/32 
set policy-options policy-statement bgp-export term direct from prefix-list bgp-export
set policy-options policy-statement bgp-export term direct then accept
set protocols bgp group external export bgp-export

vMX5

set policy-options prefix-list bgp-export 7.7.7.7/32 
set policy-options policy-statement bgp-export term direct from prefix-list bgp-export
set policy-options policy-statement bgp-export term direct then accept
set protocols bgp group external export bgp-export

Once that’s in – we should see that vMX3 is sending vMX5 the loopback for vMX1…

[email protected]> show route advertising-protocol bgp 5.5.5.5 table inet.0    

inet.0: 20 destinations, 21 routes (20 active, 0 holddown, 0 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 1.1.1.1/32              Self                 2                  I

[email protected]> 

Now let’s see what vMX5 is doing with it…

[email protected]> show route table inet.0 1.1.1.1/32 

inet.0: 20 destinations, 21 routes (20 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[OSPF/10] 00:39:45, metric 4
                    > to 169.254.10.6 via ge-0/0/0.0
                    [BGP/170] 00:02:46, MED 2, localpref 100, from 3.3.3.3
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.6 via ge-0/0/0.0, label-switched-path to-vmx3

[email protected]> 

AHHHH! We have the same problem. Our sin of taking a shortcut and using one flat IGP domain is coming back to bite us. Let’s fix this by breaking the OSPF domains into 3 areas…

We’ll make the middle area the core or area 0, the left area area 1, and the right area area 2. Areas 1 and 2 will be stub areas and we’ll filter out most of the prefixes and loopbacks from entering area 0 except for the loopbacks of the border routers since that’s what’s used for peering etc.

vMX1

delete protocols ospf
set protocols ospf area 0.0.0.1 stub
set protocols ospf area 0.0.0.1 interface lo0.0
set protocols ospf area 0.0.0.1 interface ge-0/0/1.0 interface-type p2p

vMX2

delete protocols ospf
set protocols ospf area 0.0.0.1 stub
set protocols ospf area 0.0.0.1 interface lo0.0
set protocols ospf area 0.0.0.1 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.1 interface ge-0/0/1.0 interface-type p2p

vMX3

delete protocols ospf
set protocols ospf area 0.0.0.1 stub
set protocols ospf traffic-engineering
set protocols ospf area 0.0.0.1 interface lo0.0
set protocols ospf area 0.0.0.1 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p
set protocols ospf area 0.0.0.0 network-summary-export ospf-export 
set policy-options prefix-list ospf-export 3.3.3.3/32
set policy-options policy-statement ospf-export term accept from prefix-list ospf-export 
set policy-options policy-statement ospf-export term accept then accept   
set policy-options policy-statement ospf-export then reject  

vMX4

delete protocols ospf
set protocols ospf traffic-engineering
set protocols ospf area 0.0.0.0 interface lo0.0
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/1.0 interface-type p2p

vMX5

delete protocols ospf
set protocols ospf area 0.0.0.2 stub
set protocols ospf traffic-engineering
set protocols ospf area 0.0.0.2 interface lo0.0
set protocols ospf area 0.0.0.2 interface ge-0/0/1.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.0 network-summary-export ospf-export 
set policy-options prefix-list ospf-export 5.5.5.5/32
set policy-options policy-statement ospf-export term accept from prefix-list ospf-export 
set policy-options policy-statement ospf-export term accept then accept   
set policy-options policy-statement ospf-export then reject 

vMX6

delete protocols ospf
set protocols ospf area 0.0.0.2 stub
set protocols ospf area 0.0.0.2 interface lo0.0
set protocols ospf area 0.0.0.2 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.2 interface ge-0/0/1.0 interface-type p2p

vMX7

delete protocols ospf
set protocols ospf area 0.0.0.2 stub
set protocols ospf area 0.0.0.2 interface lo0.0
set protocols ospf area 0.0.0.2 interface ge-0/0/0.0 interface-type p2p

Right – now that we have that in place – vMX5 should no longer be getting a 1.1.1.1/32 advertisement from OSPF so it should be passing the BGP advertisement onto vMX7…

[email protected]> show route advertising-protocol bgp 7.7.7.7                         

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 1.1.1.1/32              3.3.3.3              2       100        65001 I

[email protected]> 

Perfect! Now let’s see what vMX7 thinks about all of this…

[email protected]> show route 1.1.1.1/32 

inet.0: 16 destinations, 16 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[BGP/170] 04:04:16, MED 2, localpref 100, from 5.5.5.5
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.10 via ge-0/0/0.0, Push 302656

[email protected]> 

Well – we have the route now that we need – but the route is in inet.0 and for MPLS VPNs to work we have to recurse through the inet.3 table. So how do we get the prefix there? We could do a rib-group copy but since this was learned through BGP-LU we can simply use the resolve-vpn flag on the BGP session to move it over as we saw in the last post. Let’s try that out…

vMX1

set protocols bgp group internal neighbor 3.3.3.3 family inet labeled-unicast resolve-vpn 

vMX7

set protocols bgp group internal neighbor 5.5.5.5 family inet labeled-unicast resolve-vpn 

Now that that’s there on each tail router, we should see the entry in the inet.3 table…

[email protected]> show route table inet.3 

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[BGP/170] 00:01:00, MED 2, localpref 100, from 5.5.5.5
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.10 via ge-0/0/0.0, Push 302656
5.5.5.5/32         *[LDP/9] 07:43:50, metric 1
                    > to 169.254.10.10 via ge-0/0/0.0, Push 301376
6.6.6.6/32         *[LDP/9] 07:43:50, metric 1
                    > to 169.254.10.10 via ge-0/0/0.0

[email protected]> 

Awesome. But if you look, you’ll see that the tail routers are no longer receiving the remote customer prefix. And if we look further, we’ll see that the boundary routers (vMX3 and vMX5) aren’t even passing it along anymore…

[email protected]> show route advertising-protocol bgp 7.7.7.7    

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 1.1.1.1/32              3.3.3.3              2       100        65001 I

[email protected]> 

If we look in the bgp.l3vpn.0 table we’ll see that it’s hidden….

[email protected]> show route table bgp.l3vpn.0 hidden extensive 

bgp.l3vpn.0: 2 destinations, 2 routes (1 active, 0 holddown, 1 hidden)
1:1:10.10.10.0/24 (1 entry, 0 announced)
         BGP    Preference: 170/-101
                Route Distinguisher: 1:1
                Next hop type: Unusable, Next hop index: 0
                Address: 0xa0ef164
                Next-hop reference count: 1
                State: <Hidden Ext ProtectionPath ProtectionCand>
                Local AS: 65002 Peer AS: 65001
                Age: 4:08:09 
                Validation State: unverified 
                Task: BGP_65001.3.3.3.3+54361
                AS path: 65001 I 
                Communities: target:1:1
                Accepted
                VPN Label: 16
                Localpref: 100
                Router ID: 3.3.3.3
                Indirect next hops: 1
                        Protocol next hop: 1.1.1.1
                        Label operation: Push 16
                        Label TTL action: prop-ttl
                        Load balance label: Label 16: None; 
                        Indirect next hop: 0x0 - INH Session ID: 0x0

[email protected]> 

It’s saying that the next-hop is unusable. Now why would that be? We know about the 1.1.1.1/32 prefix in the inet.0 table…

[email protected]> show route 1.1.1.1     

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[BGP/170] 04:08:38, MED 2, localpref 100, from 3.3.3.3
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.6 via ge-0/0/0.0, label-switched-path to-vmx3

[email protected]> 

Our problem is that we need to know about it in the inet.3 table for it to be a valid VPNv4 route. The fix for this is the same that we just did on the tail routers. We need to tell it to move the prefixes it learns through BGP-LU from inet.0 to inet.3 when they are listed as next-hops for VPNv4 prefixes. We’ll configure that on each boundary router on the external BGP peer…

vMX3

set protocols bgp group external neighbor 5.5.5.5 family inet labeled-unicast resolve-vpn

vMX5

set protocols bgp group external neighbor 3.3.3.3 family inet labeled-unicast resolve-vpn 

After that add on both border routers we should be in business…

[email protected]> show route 1.1.1.1 

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[BGP/170] 23:55:08, MED 2, localpref 100, from 3.3.3.3
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.6 via ge-0/0/0.0, label-switched-path to-vmx3

inet.3: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[BGP/170] 19:44:55, MED 2, localpref 100, from 3.3.3.3
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.6 via ge-0/0/0.0, label-switched-path to-vmx3

[email protected]> 

Alright so let’s look at our tail routers again and see what’s up with our customer VRF routing table…

[email protected]> show route table customer1.inet.0                            

customer1.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[BGP/170] 19:46:07, localpref 100, from 5.5.5.5
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.10 via ge-0/0/0.0, Push 16, Push 302656(top)
192.168.10.0/24    *[Direct/0] 1d 03:30:41
                    > via ge-0/0/1.0
192.168.10.1/32    *[Local/0] 1d 03:30:41
                      Local via ge-0/0/1.0

[email protected]> 

Alright! We have the remote route now and if we dig a little deeper, we’ll see that it’s showing up with a protocol next-hop of 1.1.1.1 which is what we wanted…

customer1.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
10.10.10.0/24 (1 entry, 1 announced)
TSI:
KRT in-kernel 10.10.10.0/24 -> {indirect(1048575)}
        *BGP    Preference: 170/-101
                Route Distinguisher: 1:1
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd007cf0
                Next-hop reference count: 3
                Source: 5.5.5.5
                Next hop type: Router, Next hop index: 581
                Next hop: 169.254.10.10 via ge-0/0/0.0, selected
                Label operation: Push 16, Push 302656(top)
                Label TTL action: prop-ttl, prop-ttl(top)
                Load balance label: Label 16: None; Label 302656: None; 
                Label element ptr: 0xd0072c0
                Label parent element ptr: 0xd006d80
                Label element references: 1
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x192
                Protocol next hop: 1.1.1.1
                Label operation: Push 16
                Label TTL action: prop-ttl
                Load balance label: Label 16: None; 
                Indirect next hop: 0xb68bfe0 1048575 INH Session ID: 0x198
                State: <Secondary Active Int Ext ProtectionCand>
                Local AS: 65002 Peer AS: 65002
                Age: 19:46:04 	Metric2: 4 
                Validation State: unverified 
                Task: BGP_65002.5.5.5.5+49544
                Announcement bits (1): 0-KRT 
                AS path: 65001 I 
                Communities: target:1:1
                Import Accepted
                VPN Label: 16
                Localpref: 100
                Router ID: 5.5.5.5
                Primary Routing Table bgp.l3vpn.0
                Indirect next hops: 1
                        Protocol next hop: 1.1.1.1 Metric: 4
                        Label operation: Push 16
                        Label TTL action: prop-ttl
                        Load balance label: Label 16: None; 
                        Indirect next hop: 0xb68bfe0 1048575 INH Session ID: 0x198
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.10 via ge-0/0/0.0
                                Session Id: 0x192
			1.1.1.1/32 Originating RIB: inet.3
			  Metric: 4			  Node path count: 1
                                        
[email protected]> 

Cool – but take a look at that label operation. We’re still only pushing two labels at this point which doesn’t align with our thinking of the LSPs as being transport and the VPN label as being end to end. So what’s going on now? If we step back a bit and think about what we’re trying to do – let’s remember what had to happen in our non-VPN example we did above. To get from label domain to label domain we had to think that our next hop for the resolved route was within our label domain. That is – our ingress router had to believe that the resolved next hop lived within it’s own AS. Now we know that our VPN route has had it’s next-hop preserved as 1.1.1.1 on vMX7 which is what we wanted, but what does the route for 1.1.1.1/32 actually look like…

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
1.1.1.1/32 (1 entry, 1 announced)
        *BGP    Preference: 170/-101
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd006d90
                Next-hop reference count: 4
                Source: 5.5.5.5
                Next hop type: Router, Next hop index: 610
                Next hop: 169.254.10.10 via ge-0/0/0.0, selected
                Label operation: Push 302656
                Label TTL action: prop-ttl
                Load balance label: Label 302656: None; 
                Label element ptr: 0xd007500
                Label parent element ptr: 0x0
                Label element references: 2
                Label element child references: 1
                Label element lsp id: 0
                Session Id: 0x192
                Protocol next hop: 3.3.3.3
                Label operation: Push 302656
                Label TTL action: prop-ttl
                Load balance label: Label 302656: None; 
                Indirect next hop: 0xb68cb90 1048574 INH Session ID: 0x19b
                State: <Secondary Active Int Ext>
                Local AS: 65002 Peer AS: 65002
                Age: 51         Metric: 2       Metric2: 4 
                Validation State: unverified 
                Task: BGP_65002.5.5.5.5+49544
                Announcement bits (3): 2-Resolve tree 1 3-Resolve tree 2 4-Resolve_IGP_FRR task 
                AS path: 65001 I 
                Accepted
                Route Label: 302656
                Localpref: 100
                Router ID: 5.5.5.5
                Primary Routing Table inet.0
                Indirect next hops: 1
                        Protocol next hop: 3.3.3.3 Metric: 4
                        Label operation: Push 302656
                        Label TTL action: prop-ttl
                        Load balance label: Label 302656: None; 
                        Indirect next hop: 0xb68cb90 1048574 INH Session ID: 0x19b
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.10 via ge-0/0/0.0
                                Session Id: 0x192
                        3.3.3.3/32 Originating RIB: inet.0
                          Metric: 4                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 169.254.10.10 via ge-0/0/0.0
                                Session Id: 192

[email protected]> 

Aha! The protocol next-hop we’re getting for that route is the loopback of vMX3 which is not within our label domain/AS. If we want BGP-LU to do our stitching as part of the transport LSP we need to fix this. To fix this, we need to tell each border router to do a next-hop self on the BGP-LU route that’s being advertised to the tail router in it’s own AS. Let’s try that out…

vMX3 and vMX5

set policy-options policy-statement nhs term nhs from family inet 
set policy-options policy-statement nhs term nhs then next-hop self
set protocols bgp group internal export nhs

Now let’s look at that route again…

[email protected]> show route 1.1.1.1 table inet.3 extensive    

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
1.1.1.1/32 (1 entry, 1 announced)
        *BGP    Preference: 170/-101
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd0078d0
                Next-hop reference count: 4
                Source: 5.5.5.5
                Next hop type: Router, Next hop index: 613
                Next hop: 169.254.10.10 via ge-0/0/0.0, selected
                Label operation: Push 302496, Push 301376(top)
                Label TTL action: prop-ttl, prop-ttl(top)
                Load balance label: Label 302496: None; Label 301376: None; 
                Label element ptr: 0xd007f20
                Label parent element ptr: 0xd006e40
                Label element references: 2
                Label element child references: 1
                Label element lsp id: 0
                Session Id: 0x192
                Protocol next hop: 5.5.5.5
                Label operation: Push 302496
                Label TTL action: prop-ttl
                Load balance label: Label 302496: None; 
                Indirect next hop: 0xb68c640 1048576 INH Session ID: 0x19c
                State: <Secondary Active Int Ext>
                Local AS: 65002 Peer AS: 65002
                Age: 9  Metric: 2       Metric2: 1 
                Validation State: unverified 
                Task: BGP_65002.5.5.5.5+49544
                Announcement bits (3): 2-Resolve tree 1 3-Resolve tree 2 4-Resolve_IGP_FRR task 
                AS path: 65001 I 
                Accepted
                Route Label: 302496
                Localpref: 100
                Router ID: 5.5.5.5
                Primary Routing Table inet.0
                Indirect next hops: 1
                        Protocol next hop: 5.5.5.5 Metric: 1
                        Label operation: Push 302496
                        Label TTL action: prop-ttl
                        Load balance label: Label 302496: None; 
                        Indirect next hop: 0xb68c640 1048576 INH Session ID: 0x19c
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.10 via ge-0/0/0.0
                                Session Id: 0x192
                        5.5.5.5/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 169.254.10.10 via ge-0/0/0.0
                                Session Id: 0

[email protected]> 

Bingo! Now let’s look at the route in the customer VRF for the remote prefix…

[email protected]> show route table customer1.inet.0    

customer1.inet.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.10.0/24      *[BGP/170] 00:02:36, localpref 100, from 5.5.5.5
                      AS path: 65001 I, validation-state: unverified
                    > to 169.254.10.10 via ge-0/0/0.0, Push 16, Push 302496, Push 301376(top)
192.168.10.0/24    *[Direct/0] 1d 04:38:57
                    > via ge-0/0/1.0
192.168.10.1/32    *[Local/0] 1d 04:38:57
                      Local via ge-0/0/1.0

[email protected]> 

And at long last – the routing tables look as they should. We now have the 3 labels we were looking for. Whew. But… Our ping still isn’t working….

left_client:~# ping 192.168.10.100 -c 5
PING 192.168.10.100 (192.168.10.100) 56(84) bytes of data.

--- 192.168.10.100 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

left_client:~# 

Grumble. Ok. At this point, let’s do a label trace from tail router to tail router and see what’s going on now. From the output above, we know that vMX6 is going to get a top label of 301376. Presumably this is for the LDP LSP but let’s make sure…

[email protected]> show ldp database 
Input label database, 7.7.7.7:0--6.6.6.6:0
Labels received: 3
  Label     Prefix
 301376      5.5.5.5/32
      3      6.6.6.6/32
 301408      7.7.7.7/32

Output label database, 7.7.7.7:0--6.6.6.6:0
Labels advertised: 3
  Label     Prefix
 299984      5.5.5.5/32
 299968      6.6.6.6/32
      3      7.7.7.7/32

[email protected]> 

Yep – ok, that checks out. Let’s see what vMX6 does with this label now…

[email protected]> show route table mpls.0 label 301376 

mpls.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

301376             *[LDP/9] 1d 04:43:40, metric 1
                    > to 169.254.10.8 via ge-0/0/0.0, Pop      
301376(S=0)        *[LDP/9] 1d 04:43:40, metric 1
                    > to 169.254.10.8 via ge-0/0/0.0, Pop      

[email protected]> 

Alright – it is the PHP router for the LDP LSP so it will pop that label and pass along what’s left to vMX5. So vMX5 will get a top label of 302496 which if we go back and look at vMX7 again, we’ll see is the BGP-LU stitching label that it received…

[email protected]> show route receive-protocol bgp 5.5.5.5 1.1.1.1/32 table inet.3 extensive 

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
* 1.1.1.1/32 (1 entry, 1 announced)
     Accepted
     Route Label: 302496
     Nexthop: 5.5.5.5
     MED: 2
     Localpref: 100
     AS path: 65001 I 
     Entropy label capable, next hop field matches route next hop

[email protected]> 

Checks out. Let’s see what vMX5 will do with that label now…

[email protected]> show route table mpls.0 label 302496 extensive 

mpls.0: 12 destinations, 12 routes (12 active, 0 holddown, 0 hidden)
302496 (1 entry, 1 announced)
TSI:
KRT in-kernel 302496 /52 -> {indirect(1048575)}
        *VPN    Preference: 170
                Next hop type: Indirect, Next hop index: 0
                Address: 0xd007510
                Next-hop reference count: 2
                Source: 3.3.3.3
                Next hop type: Router, Next hop index: 589
                Next hop: 169.254.10.6 via ge-0/0/0.0, selected
                Label-switched-path to-vmx3
                Label operation: Swap 302656, Push 301552(top)
                Label TTL action: prop-ttl, prop-ttl(top)
                Load balance label: Label 302656: None; Label 301552: None; 
                Label element ptr: 0xd0076e0
                Label parent element ptr: 0xd007c20
                Label element references: 1
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x19a
                Protocol next hop: 3.3.3.3
                Label operation: Swap 302656
                Load balance label: Label 302656: None; 
                Indirect next hop: 0xb68d300 1048575 INH Session ID: 0x1a5
                State: <Active Int Ext>
                Local AS: 65002 
                Age: 7:57 	Metric2: 2 
                Validation State: unverified 
                Task: BGP_RT_Background
                Announcement bits (1): 1-KRT 
                AS path: 65001 I 
		Ref Cnt: 1
                Indirect next hops: 1
                        Protocol next hop: 3.3.3.3 Metric: 2
                        Label operation: Swap 302656
                        Load balance label: Label 302656: None; 
                        Indirect next hop: 0xb68d300 1048575 INH Session ID: 0x1a5
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 169.254.10.6 via ge-0/0/0.0
                                Session Id: 0x19a
			3.3.3.3/32 Originating RIB: inet.3
			  Metric: 2			  Node path count: 1
			  Forwarding nexthops: 1
				Nexthop: 169.254.10.6 via ge-0/0/0.0
				Session Id: 0

[email protected]> 

Since this is going into an RSVP LSP we have to do the extensive output to see the label operation but as you can see it’s going to swap 302496 for 302656 and then push another label of 301552. So vMX4 will be getting a top label of 301552 followed by 302656 and then our VPN label of 16. Ok so what does vMX4 do with that top label?

[email protected]> show route table mpls.0 label 301552 extensive 

mpls.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
301552 (1 entry, 1 announced)
TSI:
KRT in-kernel 301552 /52 -> {Pop       Flags acct EL-capable}
        *RSVP   Preference: 7/1
                Next hop type: Router, Next hop index: 585
                Address: 0xd007450
                Next-hop reference count: 2
                Next hop: 169.254.10.4 via ge-0/0/0.0, selected
                Label-switched-path to-vmx3
                Label operation: Pop      
                Load balance label: None; 
                Label element ptr: 0xd007560
                Label parent element ptr: 0x0
                Label element references: 2
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x181
                State: <Active Int AckRequest Accounting EL-capable>
                Age: 1d 4:48:51         Metric: 1 
                Validation State: unverified 
                Task: RSVP
                Announcement bits (1): 1-KRT 
                AS path: I 

301552(S=0) (1 entry, 1 announced)
TSI:
KRT in-kernel 301552 /56 -> {Pop       Flags acct EL-capable}
        *RSVP   Preference: 7/1
                Next hop type: Router, Next hop index: 586
                Address: 0xd006f70
                Next-hop reference count: 2
                Next hop: 169.254.10.4 via ge-0/0/0.0, selected
                Label-switched-path to-vmx3
                Label operation: Pop      
                Load balance label: None; 
                Label element ptr: 0xd006fc0
                Label parent element ptr: 0x0
                Label element references: 1
                Label element child references: 0
                Label element lsp id: 0
                Session Id: 0x181
                State: <Active Int AckRequest Accounting EL-capable>
                Age: 1d 4:48:51         Metric: 1 
                Validation State: unverified 
                Task: RSVP
                Announcement bits (1): 1-KRT 
                AS path: I 

[email protected]> 

Alright – to it’s going to pop it. Which again – makes sense since we are now in the RSVP domain and vMX4 would be the PHP router for this LSP to vMX3. So vMX3 is going to get a top label of 302656. Let’s see what vMX3 does with that…

[email protected]> show route table mpls.0 label 302656 

mpls.0: 12 destinations, 12 routes (12 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

302656             *[VPN/170] 1d 01:13:08
                    > to 169.254.10.2 via ge-0/0/0.0, Pop      
302656(S=0)        *[VPN/170] 1d 01:13:08
                    > to 169.254.10.2 via ge-0/0/0.0, Pop      

[email protected]> 

Pop… Huh. That doesn’t seem right. That would mean that vMX2 would be getting just the VPN label. Let’s do a packet capture between vMX3 and vMX4 to see if that’s whats going on…

Note: Im assuming in this case that you’re trying to do a ping from right_client to left_client. I realized as I was writing this that I’ve been doing all my other tests from left to right so apologies for the change up as part of this trace.

Alright – so sure enough – that’s what it’s doing. As we know – the VPN label is only relevant to the PE that’s advertising this. In this case, vMX2 is going to be getting a frame with a single VPN label as the top frame. And if our assumptions are correct, vMX2 will not know what to do with it…

[email protected]> show route table mpls.0 label 16 

[email protected]> show route forwarding-table label 16 
Routing table: default.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd       50     1

Routing table: __mpls-oam__.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      553     1

[email protected]> 

Yep – so it’s going to discard. So what’s going on now? The problem seems to be on vMX3. So let’s take a look and see what’s going on there. We know that vMX3 is responsible for stitching the traffic coming from the RSVP LSP into the LDP LSP to vMX1. So let’s take a look at vMX3 and see if we have a LSP to vMX1…

[email protected]> show route table inet.3 

inet.3: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[LDP/9] 1d 01:21:19, metric 1
                    > to 169.254.10.2 via ge-0/0/0.0, Push 301328
2.2.2.2/32         *[LDP/9] 1d 01:21:19, metric 1
                    > to 169.254.10.2 via ge-0/0/0.0
5.5.5.5/32         *[RSVP/7/1] 1d 04:58:22, metric 2
                    > to 169.254.10.5 via ge-0/0/1.0, label-switched-path to-vmx5
7.7.7.7/32         *[BGP/170] 21:10:16, MED 2, localpref 100, from 5.5.5.5
                      AS path: 65002 I, validation-state: unverified
                    > to 169.254.10.5 via ge-0/0/1.0, label-switched-path to-vmx5

[email protected]>

Alright so that looks good. So what’s going on here? Well – this sentence from the Juniper documentation describes the issue pretty well…

The only use for inet.3 or inet6.3 is to permit BGP to perform next-hop resolution.

Put more simply – BGP is the only protocol that will recursively resolve it’s next hop in the inet3. table. So if vMX3 can’t resolve the LSP in inet.3 then what does it do? It looks in inet.0

[email protected]> show route table inet.0 1.1.1.1/32   

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[OSPF/10] 1d 01:33:06, metric 2
                    > to 169.254.10.2 via ge-0/0/0.0

[email protected]> 

And in inet.0 we have a route to 1.1.1.1/32 through OSPF. Since vMX3 thinks that it’s best path to 1.1.1.1/32 is through OSPF it will send a BGP-LU label to vMX5 indicating that it should pop off the label so that it can forward the packet normally. The problem is that once the frame gets to vMX3 – there are still two labels on the frame meaning that vMX4 will receive a label with a single VPN label which it has no idea what to do with.

The fix for this is to get the forwarding information required from inet.3 into the inet.0 table. There are many many many ways to do this, but in my opinion, the easiest and the safest way to do this is by telling the router to use routes from inet.3 for forwarding only…

vMX3 and vMX5

set protocols mpls traffic-engineering mpls-forwarding 

After we add that our ping should start working! Let’s now look at the inet.0 table on vMX3 again for the 1.1.1.1/32 prefix…

[email protected]> show route table inet.0 1.1.1.1/32 

inet.0: 17 destinations, 20 routes (17 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         @[OSPF/10] 00:00:38, metric 2
                    > to 169.254.10.2 via ge-0/0/0.0
                   #[LDP/9] 00:00:38, metric 1
                    > to 169.254.10.2 via ge-0/0/0.0, Push 301328

[email protected]> 

Now we can see that the LDP router (with the lower priority) is in the table and has the # next to it indicating that it’s being used for just forwarding traffic and not as part of any route selection protocol. So now our picture should look like this…

Alright – well that was a long road to get here. But we finally made it and walking through troubleshooting something like this is always helpful in my opinion. Clearly, we had to jump through some hoops because of the way our BGP peering was setup so in the next post we’ll talk through some other options for how to do this. If any of this seemed confusing to you, I’d encourage you to read the post on BPG-LU as well as the one on MPLS VPNs with BGP LU.

4 thoughts on “Fundamentals of MPLS LSPs

  1. Tomasi

    Great post!

    Is this something like Segment Routing?

    In a backbone with multiple IGP paths, where is the best places to run RSVP LSP or LDP LSP?

    Thanks!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *