LDP Multipath/ECMP

      No Comments on LDP Multipath/ECMP

In one of our last posts on MPLS – we showed how LDP can be used to dynamically exchange labels between MPLS enabled routers. This was a huge improvement from statically configured LSPs. We saw that LDP was tightly coupled to the underlying IGP but we didn’t spend a lot of time showing examples of that. In this post, I want to extend our lab a little bit to show you exactly how tightly this coupling is. To do so, we’ll use part of the extended lab that we created at the end of our last post on the JunOS routing table. For the sake of being thorough, we’ll provide the entire configuration of each device. The lab will look like what’s shown below…

For a base configuration – we’re going to start with mostly everything there with the exception of MPLS and LDP. We’ll then add that in stages so we can see how things look before and after we have multiple IGP paths to the same destination…

vMX1 Configuration…

system {
    host-name vmx1.lab;
}
interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.2.2.0/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {               
                address 10.1.1.0/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 10.1.1.8/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 1.1.1.1/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 1.1.1.1;
    autonomous-system 65000;
}
protocols {
    bgp {
        group internal {
            type internal;
            local-address 1.1.1.1;
            export redistribute_connected_static;
            neighbor 4.4.4.4;
        }
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/1.0;
            interface ge-0/0/2.0;
        }
    }
}
policy-options {
    prefix-list tp_bgp;
    prefix-list to_bgp {
        10.2.2.0/31;
    }
    policy-statement redistribute_connected_static {
        from {
            protocol [ direct static ];
            prefix-list to_bgp;
        }
        then accept;
    }
}

vMX2 Configuration…

system {
    host-name vmx2.lab;
}
interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.1.1.1/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {               
                address 10.1.1.2/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 10.1.1.6/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 2.2.2.2/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 2.2.2.2;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0;
            interface ge-0/0/1.0;
            interface ge-0/0/2.0;
        }
    }
}

vMX3 Configuration…

system {
    host-name vmx3.lab;
}
interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.1.1.3/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {               
                address 10.1.1.4/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 10.1.1.9/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 3.3.3.3/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 3.3.3.3;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0;
            interface ge-0/0/1.0;
            interface ge-0/0/2.0;
        }
    }
}

vMX4 Configuration…

system {
    host-name vmx4.lab;
}
interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.1.1.5/31;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {               
                address 10.2.2.2/31;
            }
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 10.1.1.7/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 4.4.4.4/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 4.4.4.4;
    autonomous-system 65000;
}
protocols {
    bgp {
        group internal {
            type internal;
            local-address 4.4.4.4;
            export redistribute_connected_static;
            neighbor 1.1.1.1;
        }
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0;
            interface ge-0/0/2.0;
        }
    }
}
policy-options {
    prefix-list to_bgp {
        10.2.2.2/31;
    }
    policy-statement redistribute_connected_static {
        from {
            protocol [ direct static ];
            prefix-list to_bgp;
        }
        then accept;
    }
}

So nothing too crazy here – all the router’s are peered OSPF and the two edge routes (vMX1 and vMX4) are iBGP peered through their loopbacks. At this point, each router should see the other routers customer prefix, but the customers should have no connectivity to each other. This is because while vMX1 has a route for 10.2.2.2/31 it’s reachable only through vMX2 and vMX3 who have no idea about the prefix since they’re not privy to the routing information being exchanged in BGP. To fix this, we can configure MPLS and LDP so that vMX1 and vMX4 can tunnel the traffic in MPLS between each other. Let’s start by configuring a single path between vMX1 and vMX4…

To enable this path, we’ll run the following configurations on the following routers…

vMX1 Configuration…

set interfaces ge-0/0/1 unit 0 family mpls 
set protocols mpls interface ge-0/0/1 
set protocols ldp interface ge-0/0/1 

vMX2 Configuration…

set interfaces ge-0/0/0 unit 0 family mpls 
set protocols mpls interface ge-0/0/0 
set protocols ldp interface ge-0/0/0 
set interfaces ge-0/0/2 unit 0 family mpls    
set protocols mpls interface ge-0/0/2         
set protocols ldp interface ge-0/0/2 

vMX4 Configuration…

set interfaces ge-0/0/2 unit 0 family mpls 
set protocols mpls interface ge-0/0/2 
set protocols ldp interface ge-0/0/2 

After committing the above configuration – your clients should now have connectivity between each other…

Note: I’m using NetNS in my lab to simulate the clients hence the syntax below…

root@the-lab:~# ip netns exec bot_user ping 10.2.2.1 -c 2
PING 10.2.2.1 (10.2.2.1) 56(84) bytes of data.
64 bytes from 10.2.2.1: icmp_seq=1 ttl=61 time=1.68 ms
64 bytes from 10.2.2.1: icmp_seq=2 ttl=61 time=1.64 ms

--- 10.2.2.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.640/1.660/1.681/0.045 ms
root@the-lab:~# 

Great! But nothing too exciting here – this is the most basic of MPLS configurations. Let’s take a quick look around to make sure we understand what’s going on here though. vMX1 is going to be our ingress MPLS node for this traffic flow so let’s look and see what’s going on there…

[email protected]> show route table inet.0 10.2.2.2/31    

inet.0: 19 destinations, 19 routes (19 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.2.2.2/31        *[BGP/170] 00:12:35, localpref 100, from 4.4.4.4
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]> 

As expected – our route is still here – but note that it includes a MPLS operation of Push 299792. Where did this come from?

[email protected]> show ldp database    
Input label database, 1.1.1.1:0--2.2.2.2:0
Labels received: 3
  Label     Prefix
 299776      1.1.1.1/32
      3      2.2.2.2/32
 299792      4.4.4.4/32

Output label database, 1.1.1.1:0--2.2.2.2:0
Labels advertised: 3
  Label     Prefix
      3      1.1.1.1/32
 299776      2.2.2.2/32
 299792      4.4.4.4/32

[email protected]>

That’s the label that vMX2 told us to use to reach the prefix 4.4.4.4/32. We could have also seen this in our inet.3 routing table which, if you recall from earlier posts, is what the MX uses to resolve LSP endpoints. In other words – it’s where it stores all of the LSP endpoints (which is yet another reason I prefer Juniper’s MPLS implementation over Cisco’s, but I digress)…

[email protected]> show route table inet.3 

inet.3: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[LDP/9] 00:17:05, metric 1
                    > to 10.1.1.1 via ge-0/0/1.0
4.4.4.4/32         *[LDP/9] 00:16:17, metric 1
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]> 

Notice the same label is listed here as well. If we look at vMX2 we’ll see that it’s learning an implicit null label (3) from 4.4.4.4 so that vMX2 can do PHP…

[email protected]> show ldp database 
Input label database, 2.2.2.2:0--1.1.1.1:0
Labels received: 3
  Label     Prefix
      3      1.1.1.1/32
 299776      2.2.2.2/32
 299792      4.4.4.4/32

Output label database, 2.2.2.2:0--1.1.1.1:0
Labels advertised: 3
  Label     Prefix
 299776      1.1.1.1/32
      3      2.2.2.2/32
 299792      4.4.4.4/32

Input label database, 2.2.2.2:0--4.4.4.4:0
Labels received: 3
  Label     Prefix
 299792      1.1.1.1/32
 299776      2.2.2.2/32
      3      4.4.4.4/32

Output label database, 2.2.2.2:0--4.4.4.4:0
Labels advertised: 3
  Label     Prefix
 299776      1.1.1.1/32
      3      2.2.2.2/32
 299792      4.4.4.4/32

[email protected]> 

So again – nothing exciting here – we’re just validating what we already know. JunOS also has a handy MPLS traceroute tool we can use to validate the path we’re seeing to the egress LSP router…

[email protected]> traceroute mpls ldp 4.4.4.4/32         
  Probe options: ttl 64, retries 3, wait 10, paths 16, exp 7, fanout 16

  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    1   299792  LDP         10.1.1.1         (null)           Success           
  FEC-Stack-Sent: LDP 
  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    2        3  LDP         10.1.1.7         10.1.1.1         Egress            
  FEC-Stack-Sent: LDP 

  Path 1 via ge-0/0/1.0 destination 127.0.0.64


[email protected]> 

That’s also a great way to see all the labels being used in the path as well. So now let’s enable all the other MPLS interfaces so we can get multiple paths…

vMX1 Configuration…

set interfaces ge-0/0/2 unit 0 family mpls  
set protocols mpls interface ge-0/0/2        
set protocols ldp interface ge-0/0/2

vMX2 Configuration…

set interfaces ge-0/0/1 unit 0 family mpls  
set protocols mpls interface ge-0/0/1 
set protocols ldp interface ge-0/0/1 

vMX3 Configuration…

set interfaces ge-0/0/0 unit 0 family mpls 
set interfaces ge-0/0/1 unit 0 family mpls    
set interfaces ge-0/0/2 unit 0 family mpls    
set protocols mpls interface ge-0/0/0 
set protocols mpls interface ge-0/0/1    
set protocols mpls interface ge-0/0/2    
set protocols ldp interface ge-0/0/0 
set protocols ldp interface ge-0/0/1 
set protocols ldp interface ge-0/0/2

vMX4 Configuration…

set interfaces ge-0/0/0 unit 0 family mpls 
set protocols mpls interface ge-0/0/0
set protocols ldp interface ge-0/0/0

After the above configuration has been committed – let’s look at the route for 10.2.2.2/31on vMX1 again…

[email protected]> show route table inet.0 10.2.2.2/31    

inet.0: 19 destinations, 19 routes (19 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.2.2.2/31        *[BGP/170] 00:28:24, localpref 100, from 4.4.4.4
                      AS path: I, validation-state: unverified
                      to 10.1.1.1 via ge-0/0/1.0, Push 299792
                    > to 10.1.1.9 via ge-0/0/2.0, Push 299808

[email protected]>

Aha! So looks to me like the RIB sure thinks it can get to 4.4.4.4 two different ways. One through vMX2 and one through vMX3 – each router has sent vMX1 a unique label (doesn’t have to be unique to vMX1 – just happens to be) to reach the LSP endpoint. This new next hop is available to us because the underlying IGP now has 2 equal cost paths to reach the LSP endpoint (in this case vMX4). As we learned in our last post, just because these entries are here – doesn’t mean they’ve been exported to the forwarding engines. Since the lab was rebuilt, the “per-packet” configuration is missing so our forwarding table will only show the selected next-hop…

[email protected]> show route forwarding-table family inet table default destination 10.2.2.2/31   
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
10.2.2.2/31        user     0                    indr  1048574     2
                              10.1.1.9          Push 299808      594     2 ge-0/0/2.0

[email protected]> 

If we add the pre-packet configuration back in…

policy-options {
    policy-statement chassis_load_balance {
        then {
            load-balance per-packet;
        }
    }
}
routing-options {
    forwarding-table {
        export chassis_load_balance;
    }
}

We’ll now see two possible next-hopes in the forwarding table with the associated labels…

[email protected]> show route forwarding-table family inet table default destination 10.2.2.2/31    
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
10.2.2.2/31        user     0                    indr  1048574     2
                                                 ulst  1048575     2
                              10.1.1.1          Push 299792      589     2 ge-0/0/1.0
                              10.1.1.9          Push 299808      594     2 ge-0/0/2.0

[email protected]> 

And if we look at our traceroute command again – we’ll see that it’s taking into account both paths which is pretty darn handy…

[email protected]> traceroute mpls ldp 4.4.4.4/32    
  Probe options: ttl 64, retries 3, wait 10, paths 16, exp 7, fanout 16

  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    1   299792  LDP         10.1.1.1         (null)           Success           
  FEC-Stack-Sent: LDP 
  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    2        3  LDP         10.1.1.7         10.1.1.1         Egress            
  FEC-Stack-Sent: LDP 

  Path 1 via ge-0/0/1.0 destination 127.0.0.64

  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    1   299808  LDP         10.1.1.9         (null)           Success           
  FEC-Stack-Sent: LDP 
  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    2        3  LDP         10.1.1.5         10.1.1.9         Egress            
  FEC-Stack-Sent: LDP 

  Path 2 via ge-0/0/2.0 destination 127.0.1.64


[email protected]>

So meh. That wasn’t terribly exciting. We had already talked way back when about the strong connection between LDP and the underlying IGP. So let’s make this more exciting. Let’s pretend that we’ve been asked to move traffic off the cross-links between vMX1-vMX3 and vMX2-vMX4. So how do we do this? The easiest way would be to just cost up the involved interfaces in OSPF to make them less preferred. In most cases, you’d do this on both ends of the link, but since we’re interested in only seeing the outcome of this on vMX1, we’ll focus our configuration on a limited set of router interfaces. Namely, vMX1 ge-0/0/2 and vMX2 ge-0/0/2. The goal is for vMX1 to think the topology looks like this…

To do this, we’ll simply add the following commands on vMX1 and vMX2…

set protocols ospf area 0.0.0.0 interface ge-0/0/2.0 metric 20

As a refresher in OSPF – remember that metrics are calculated on ingress. That is – the receiving router tacks on it’s interface cost for a given destination prefix it receives through that interface. By costing up the metric of ge-0/0/2 on vMX1 we’re making all advertisements received from vMX3 look poor. The same happens on vMX2 with the route’s received from vMX4. Let’s confirm this by checking out the OSPF routing table on vMX1…

[email protected]> show ospf route 
Topology default Route Table:

Prefix             Path  Route      NH       Metric NextHop       Nexthop      
                   Type  Type       Type            Interface     Address/LSP
2.2.2.2            Intra Router     IP            1 ge-0/0/1.0    10.1.1.1
3.3.3.3            Intra Router     IP            2 ge-0/0/1.0    10.1.1.1
4.4.4.4            Intra Router     IP            3 ge-0/0/1.0    10.1.1.1
1.1.1.1/32         Intra Network    IP            0 lo0.0
2.2.2.2/32         Intra Network    IP            1 ge-0/0/1.0    10.1.1.1
3.3.3.3/32         Intra Network    IP            2 ge-0/0/1.0    10.1.1.1
4.4.4.4/32         Intra Network    IP            3 ge-0/0/1.0    10.1.1.1
10.1.1.0/31        Intra Network    IP            1 ge-0/0/1.0
10.1.1.2/31        Intra Network    IP            2 ge-0/0/1.0    10.1.1.1
10.1.1.4/31        Intra Network    IP            3 ge-0/0/1.0    10.1.1.1
10.1.1.6/31        Intra Network    IP            4 ge-0/0/1.0    10.1.1.1
10.1.1.8/31        Intra Network    IP            3 ge-0/0/1.0    10.1.1.1

[email protected]> 

As you can see – the router is preferring the path through vMX2 to reach the loopback of vMX3 and vMX4. Note that the metric indicated for 3.3.3.3is 2 and for 4.4.4.4 we have a metric of 3 which makes sense given that vMX4 is an additional hop (interface) away.

Let’s also now configure vMX3 as a BGP peer of vMX1 (yes, I know this should be a full mesh but remember we’re just interested in what vMX1 is seeing for this demonstration).

vMX1 Configuration…

set protocols bgp group internal neighbor 3.3.3.3

vMX3 Configuration…

set protocols bgp group internal type internal
set protocols bgp group internal local-address 3.3.3.3
set protocols bgp group internal export redistribute_connected_static
set protocols bgp group internal neighbor 1.1.1.1
set protocols bgp group internal neighbor 4.4.4.4
set policy-options policy-statement redistribute_connected_static from protocol direct
set policy-options policy-statement redistribute_connected_static from protocol static
set policy-options policy-statement redistribute_connected_static from prefix-list to_bgp
set policy-options policy-statement redistribute_connected_static then accept

So now let’s add in a static discard route to play around with. On vMX3 and vMX4 add the following static route…

set routing-options static route 9.9.9.9/32 discard
set policy-options prefix-list to_bgp 9.9.9.9/32

Doing so will make the prefix 9.9.9.9/32 reachable through BGP on vMX1 from both vMX3 and vMX4. Let’s validate…

[email protected]> show route table inet.0 9.9.9.9/32    

inet.0: 20 destinations, 22 routes (20 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

9.9.9.9/32         *[BGP/170] 00:00:03, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299808
                    [BGP/170] 00:36:19, localpref 100, from 4.4.4.4
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]> 

Ok – this looks sort of reasonable. By looking at this we can see that vMX1 has selected the path from 3.3.3.3to reach 9.9.9.9/32. But why? If we dig deeper here – we’d see that both vMX3 and vMX4 are offerring up equal cost paths to reach 9.9.9.9/32 so why is it picking just one? Those of you who read my last post might have caught on that our BGP configuration on vMX1 is lacking the multipath statement meaning that vMX1 is picking a single best path which happens to be from the router with the lowest peer IP address. So let’s add multipath into our BGP configuration and see what happens…

set protocols bgp group internal multipath

Now if we look at the routing table again…

[email protected]> show route table inet.0 9.9.9.9/32    

inet.0: 20 destinations, 22 routes (20 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

9.9.9.9/32         *[BGP/170] 00:00:22, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299808
                      to 10.1.1.1 via ge-0/0/1.0, Push 299792
                    [BGP/170] 00:39:45, localpref 100, from 4.4.4.4
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]> 

Success! The next hop from 4.4.4.4 has been copied into the active path. vMX1 now believes that it can reach 9.9.9.9/32 from both vMX3 and vMX4. But wait a second, does this actually make sense?

The above illustration makes the problem pretty obvious. If LDP and the IGP are so closely tied together, then why is the router using a path from vMX4 to reach the same prefix which can be reached one hop closer on vMX3? The answer lies in the LDP metrics…

[email protected]> show route table inet.3 

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[LDP/9] 00:41:57, metric 1
                    > to 10.1.1.1 via ge-0/0/1.0
3.3.3.3/32         *[LDP/9] 00:30:07, metric 1
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299808
4.4.4.4/32         *[LDP/9] 00:30:07, metric 1
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]> 

Note how each entry in the inet.3 table lists a metric of 1. Since we’re using LSPs to reach 9.9.9.9/32 that’s the metric the router is using to determine which LSPs to use to reach a given prefix. So in this case, we have an LSP from 3.3.3.3 and 4.4.4.4 both of which have a metric of 1 in the inet.3 table. This seems wrong but LDP routes automatically get a metric of 1 with an AD of 9. Luckily – there’s a fix for this. We can tell LDP to inherit the underlying IGP metric with the track-igp-metric command. Let’s apply it on vM1 (the command is locally significant) and see what happens…

set protocols ldp track-igp-metric 

Now let’s checkout the inet.3 table again…

[email protected]> show route table inet.3               

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[LDP/9] 00:00:03, metric 1
                    > to 10.1.1.1 via ge-0/0/1.0
3.3.3.3/32         *[LDP/9] 00:00:03, metric 2
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299808
4.4.4.4/32         *[LDP/9] 00:00:03, metric 3
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]>

Looking better. Notice how the routes show different metrics now. Now let’s look at the routing table entry for our destination prefix again…

[email protected]> show route table inet.0 9.9.9.9/32    

inet.0: 20 destinations, 22 routes (20 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

9.9.9.9/32         *[BGP/170] 00:00:39, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299808
                    [BGP/170] 00:00:39, localpref 100, from 4.4.4.4
                      AS path: I, validation-state: unverified
                    > to 10.1.1.1 via ge-0/0/1.0, Push 299792

[email protected]>

And everything is right with the world again. The router has rightly decided to use the path which represents the closest IGP path to the destination prefix.

So there you have it. In the next series of posts I want to start covering RSVP and then get us into some traffic engineering examples before we start diving into newer and more emerging technology. Stay tuned!

Leave a Reply

Your email address will not be published. Required fields are marked *