Multipath routing with VPNv4

      1 Comment on Multipath routing with VPNv4

When we talk about VPNv4 prefixes – Route Distinguishers (RDs) play an incredibly important role in ensuring multipath routing. We talked about this a little bit in our last post, but I want to hit it home in this post as well as cover a couple of other items in a little great detail.

To start with, we’re going to use our same physical lab topology, but changes things up slightly. Namely….

  • vMX7 will remain a BGP route reflector but will also now participate in the dataplane by having it’s interfaces enabled with LDP and MPLS
  • vMX1 and vMX3 will act as remote PEs for our provider that are hosting an anycast service. Both will be advertising 140.10.20.0/24 into a “Provider” VPN which we will then import into our customer VPN.
  • vMX2 will now act like a Customer Edge (CE) router peering to vMX5 which will act as the provider edge (PE) router. It will peer with the provider in AS65000 from AS12345.

So our diagram will now look something like this…

Alright. So for the sake of thoroughness, I’ll start by including our base configurations again since they did change ever so slightly, and I am going to include the LDP and MPLS configurations from the get go since we talked about why those were needed in the last post. Nothing below should be new to you, but if it is – take a gander at the last post...

vMX1

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 140.10.20.1/24;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.0/31;
            }
            family mpls;
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 1.1.1.1/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 1.1.1.1;
}
protocols {
    mpls {                              
        interface ge-0/0/1.0;
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
        }
    }
    ldp {
        interface ge-0/0/1.0;
    }
}

vMX2

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.10.20.1/24;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.2/31;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 2.2.2.2/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 2.2.2.2;
}

vMX3

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 140.10.20.1/24;
            }
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.4/31;
            }
            family mpls;
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 3.3.3.3/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 3.3.3.3;
}
protocols {
    mpls {                              
        interface ge-0/0/1.0;
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
        }
    }
    ldp {
        interface ge-0/0/1.0;
    }
}

vMX4

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.1/31;
            }
            family mpls;
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.8/31;
            }
            family mpls;
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.7/31;
            }
            family mpls;
        }
    }
    ge-0/0/3 {
        unit 0 {
            family inet {
                address 169.254.10.15/31;
            }
            family mpls;
        }
    }
    lo0 {
        unit 0 {                        
            family inet {
                no-redirects;
                address 4.4.4.4/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 4.4.4.4;
}
protocols {
    mpls {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/2.0;
        interface ge-0/0/3.0;
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
            interface ge-0/0/3.0 {
                interface-type p2p;
            }
        }
    }
    ldp {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/2.0;           
        interface ge-0/0/3.0;
    }
}

vMX5

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.9/31;
            }
            family mpls;
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.10/31;
            }
            family mpls;
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.3/31;
            }
        }
    }
    ge-0/0/3 {
        unit 0 {
            family inet {
                address 169.254.10.13/31;
            }
            family mpls;
        }
    }
    lo0 {
        unit 0 {
            family inet {               
                no-redirects;
                address 5.5.5.5/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 5.5.5.5;
}
protocols {
    mpls {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/3.0;
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/3.0 {
                interface-type p2p;
            }
        }
    }
    ldp {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/3.0;
    }
}

vMX6

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.6/31;
            }
            family mpls;
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.11/31;
            }
            family mpls;
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.5/31;
            }
            family mpls;
        }
    }
    ge-0/0/3 {
        unit 0 {
            family inet {
                address 169.254.10.17/31;
            }
            family mpls;
        }
    }
    lo0 {
        unit 0 {                        
            family inet {
                no-redirects;
                address 6.6.6.6/32 {
                    primary;
                }
            }
        }
    }
}
routing-options {
    router-id 6.6.6.6;
}
protocols {
    mpls {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/2.0;
        interface ge-0/0/3.0;
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
            interface ge-0/0/3.0 {
                interface-type p2p;
            }
        }
    }
    ldp {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/2.0;           
        interface ge-0/0/3.0;
    }
}

vMX7

interfaces {
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 169.254.10.12/31;
            }
            family mpls;
        }
    }
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 169.254.10.16/31;
            }
            family mpls;
        }
    }
    ge-0/0/2 {
        unit 0 {
            family inet {
                address 169.254.10.14/31;
            }
            family mpls;
        }
    }
    lo0 {
        unit 0 {
            family inet {
                no-redirects;
                address 7.7.7.7/32 {
                    primary;
                }
            }
        }
    }                                   
}
routing-options {
    router-id 7.7.7.7;
}
protocols {
    mpls {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/2.0;
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/0.0 {
                interface-type p2p;
            }
            interface ge-0/0/1.0 {
                interface-type p2p;
            }
            interface ge-0/0/2.0 {
                interface-type p2p;
            }
        }
    }
    ldp {
        interface ge-0/0/0.0;
        interface ge-0/0/1.0;
        interface ge-0/0/2.0;
    }
}

Alright – so now we have to configure our VRFs (local routing-instances) and BGP so we can advertise customer VPNv4 routes. Let’s get started…

vMX1 (A provider PE)

set routing-instances provider instance-type vrf
set routing-instances provider interface ge-0/0/0.0
set routing-instances provider route-distinguisher 65000:1
set routing-instances provider vrf-table-label 

set policy-options community customer_a members target:65000:12345
set policy-options community into_provider members target:65000:1
set policy-options community outof_provider members target:65000:2

set policy-options policy-statement provider_vrf_import term all from community into_provider
set policy-options policy-statement provider_vrf_import term all from community outof_provider
set policy-options policy-statement provider_vrf_import term all then accept
set policy-options policy-statement provider_vrf_import then reject

set policy-options policy-statement provider_vrf_export term all then community add outof_provider
set policy-options policy-statement provider_vrf_export term all then accept
set policy-options policy-statement provider_vrf_export then reject

set routing-instances provider vrf-export provider_vrf_export
set routing-instances provider vrf-import provider_vrf_import

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 1.1.1.1
set protocols bgp group internal neighbor 7.7.7.7
set protocols bgp group internal family inet-vpn unicast

Ok, let’s talk through some of this because there are some things that are worth explaining. The initial block configures our “provider” routing instance and maps an interface into it as well as assigns an RD. We’ve done this before, so this should be pretty self explanatory. The next block defines a set of communities that we’re going to use in the subsequent VRF import/export policy. There are 3 communities that will be used as follows…

customer_a – A route target used to describe routes from customer A
into_provider – A route target used to describe routes that we want to go into the provider VRF. Since we want to be able to talk to prefixes originated from the provider network, this will include customer A prefixes that will source traffic toward the provider to facilitate return routing.
outof_provider – A route target used to describe routes that are advertised out of the provider network. These would be the prefixes that we want to import into the customer VRF so that we can reach them from the customer.

The into and outof naming is meant to try and help you visualize what we’re trying to accomplish here but you may still be wondering why a VRF needs more than one target. So let’s talk through that to make sure we’re on the same page.

Let’s visualize an environment where we have many customers and one provider. Imagine that each customer is defined by a single route target but you also had a “provider” or “common” VRF that each customer expected to access. Let’s also say that this provider VRF was defined by a single route target. How might you go about allowing all of the customer VRFs access to the provider VRF? There are a few ways to do this. You might initially think of configuring the VRF import and export as follows…

Note: While not important in these diagrams (since there’s only one logical instance of each VRF) I show that each VRF always imports it’s own RT since there could be multiple instances of the VRF at different places on the network. Don’t let that confuse you in the below depictions.

However – if you do this, you’ll need to update the VRF import policy on the provider network each time you add another customer. Not ideal. Your next iteration might look something like this…

Can you see the problem with the above design? This seems like an improvement since all of the work is done on the customer VRFs which seems doable as part of customer provisioning. However what we’ve really just accomplished here is advertising all customer networks to all other customers. If all customer VRFs import and export the provider route target then every customer will see every other customers routes which sort of defeats the purpose of what we’re trying to do here with VRFs. The real solution here is to use distinct import and export route targets for the provider network…

In doing so the provider VRF does import and export with unique route targets. This means that each customer can import the provider exported routes AND the provider can import a unique route target that is exported from each customer. In our example, routes coming from the provider are associated with the target outof_provider meaning they are advertised out of the provider VRF into the backbone. These routes then need to be imported into each customer VRF. Then each customer exports it’s routes with it’s own unique route target as well as the into_provider route target which the provider network then imports. This prevents customer prefixes from being imported into other customer VRFs at the same time as allowing bidirectional communication between customers and the provider.

The rest of the configuration sets up the policy as defined above and then applies it to the provider VRF. We wrap things up by setting up a BGP peering to vMX7 which will still act as our route reflector in this lab.

vMX3 (A provider PE)

set routing-instances provider instance-type vrf
set routing-instances provider interface ge-0/0/0.0
set routing-instances provider route-distinguisher 65000:1
set routing-instances provider vrf-table-label 

set policy-options community customer_a members target:65000:12345
set policy-options community into_provider members target:65000:1
set policy-options community outof_provider members target:65000:2

set policy-options policy-statement provider_vrf_import term all from community into_provider
set policy-options policy-statement provider_vrf_import term all from community outof_provider
set policy-options policy-statement provider_vrf_import term all then accept
set policy-options policy-statement provider_vrf_import then reject

set policy-options policy-statement provider_vrf_export term all then community add outof_provider
set policy-options policy-statement provider_vrf_export term all then accept
set policy-options policy-statement provider_vrf_export then reject

set routing-instances provider vrf-export provider_vrf_export
set routing-instances provider vrf-import provider_vrf_import

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 3.3.3.3
set protocols bgp group internal neighbor 7.7.7.7
set protocols bgp group internal family inet-vpn unicast

The above configuration should be almost identical to that of vMX1 with the exception of the BGP configuration which specifies a different BGP local address.

vMX5 (The customer PE)

set policy-options community customer_a members target:65000:12345
set policy-options community into_provider members target:65000:1
set policy-options community outof_provider members target:65000:2

set policy-options policy-statement cust_a_vrf_import term all from community customer_a
set policy-options policy-statement cust_a_vrf_import term all from community outof_provider
set policy-options policy-statement cust_a_vrf_import term all then accept
set policy-options policy-statement cust_a_vrf_import then reject

set policy-options policy-statement cust_a_vrf_export term all then community add customer_a
set policy-options policy-statement cust_a_vrf_export term all then community add into_provider
set policy-options policy-statement cust_a_vrf_export term all then accept
set policy-options policy-statement cust_a_vrf_export then reject

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 5.5.5.5
set protocols bgp group internal neighbor 7.7.7.7
set protocols bgp group internal family inet-vpn unicast

set routing-instances customer_a instance-type vrf
set routing-instances customer_a interface ge-0/0/2.0
set routing-instances customer_a route-distinguisher 65000:12345
set routing-instances customer_a vrf-table-label
set routing-instances customer_a instance-type vrf
set routing-instances customer_a interface ge-0/0/2.0
set routing-instances customer_a vrf-import cust_a_vrf_import
set routing-instances customer_a vrf-export cust_a_vrf_export
set routing-instances customer_a protocols bgp group customer_a peer-as 12345
set routing-instances customer_a protocols bgp group customer_a neighbor 169.254.10.2
set routing-instances customer_a protocols bgp group customer_a type external

Now that I’ve talked through the import and export policy for the provider, the above policy for the customer should also make sense. We export our routes with the unique customer route target as well as the into_provider route target to allow the customer prefixes to be imported into the provider VRF. Since our customer wants to peer to the provider from a locally managed CE (customer edge) router, we need to define a BGP session within the customer A routing instance. This means that vMX5 will support both the iBGP peering (in the default or global table) to the backbone route reflector (vMX7) as well as a normal IPv4 unicast BGP peering to the customer A CE (within the customer A routing instance). The configuration of the CE peering should look familiar to you with the exception that it’s done within a routing-instance.

vMX7 (Backbone route reflector)

set routing-options autonomous-system 65000   
set protocols bgp group internal peer-as 65000
set protocols bgp group internal local-address 7.7.7.7
set protocols bgp group internal neighbor 1.1.1.1
set protocols bgp group internal neighbor 5.5.5.5
set protocols bgp group internal neighbor 3.3.3.3
set protocols bgp group internal cluster 0.0.0.0
set protocols bgp group internal family inet-vpn unicast

This configuration should also look familiar to you from the last post. The only difference you should notice is that we’re now peering to the new customer PE (vMX5) instead of vMX2.

vMX2 (The customer CE)

set routing-options autonomous-system 12345
set protocols bgp group provider type external
set protocols bgp group provider peer-as 65000
set protocols bgp group provider neighbor 169.254.10.3
set policy-options policy-statement customer_a_adv term connected from protocol direct  
set policy-options policy-statement customer_a_adv term connected from prefix-list to_provider          
set policy-options policy-statement customer_a_adv term connected then accept                     
set policy-options policy-statement customer_a_adv then reject                   
set policy-options prefix-list to_provider 10.10.20.0/24 
set protocols bgp group provider export customer_a_adv 

And lastly, on vMX2, we have our most basic configuration. A simple BGP peering session in the global table (no VRFs) with a export policy that advertises the network connected to endpoint 2 (10.10.20.0/24) to it’s BGP peer (the PE).

Once the above configuration is in place, our customer endpoint (endpoint 2) should be able to reach the anycast address space being advertised from both provider PEs (140.10.20.0/24), specifically the provider endpoint of 140.10.20.100 which is hosted on endpoint 1 and endpoint 3…

root@endpoint2:/root# ping 140.10.20.100
PING 140.10.20.100 (140.10.20.100) 56(84) bytes of data.
64 bytes from 140.10.20.100: icmp_seq=1 ttl=60 time=2.40 ms
64 bytes from 140.10.20.100: icmp_seq=2 ttl=60 time=4.64 ms
64 bytes from 140.10.20.100: icmp_seq=3 ttl=60 time=5.64 ms
64 bytes from 140.10.20.100: icmp_seq=4 ttl=60 time=2.24 ms
64 bytes from 140.10.20.100: icmp_seq=5 ttl=60 time=2.16 ms

Great! So we’re up and running and our provider import and export policy seems to be working as expected. Let’s now take a look at the routing table of the customer CE (vMX2) and the customer PE (vMX5)…

[email protected]> show route table inet.0 

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[Direct/0] 1d 00:32:19
                    > via lo0.0
10.10.20.0/24      *[Direct/0] 1d 00:30:03
                    > via ge-0/0/0.0
10.10.20.1/32      *[Local/0] 1d 00:30:03
                      Local via ge-0/0/0.0
10.171.200.0/22    *[Static/5] 1d 00:32:19
                    > to 192.168.127.100 via fxp0.0
140.10.20.0/24     *[BGP/170] 13:12:58, localpref 100
                      AS path: 65000 I, validation-state: unverified
                    > to 169.254.10.3 via ge-0/0/1.0
169.254.10.2/31    *[Direct/0] 1d 00:30:03
                    > via ge-0/0/1.0
169.254.10.2/32    *[Local/0] 1d 00:30:03
                      Local via ge-0/0/1.0
192.168.127.0/24   *[Direct/0] 1d 00:32:19
                    > via fxp0.0
192.168.127.2/32   *[Local/0] 1d 00:32:19
                      Local via fxp0.0
224.0.0.5/32       *[OSPF/10] 1d 00:32:20, metric 1
                      MultiRecv

[email protected]> 

We see that vMX2 has a single route to reach the remote anycast prefix of 140.10.20.0/24 which is being advertised from both provider PEs.

[email protected]> show route table customer_a.inet.0 

customer_a.inet.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.20.0/24      *[BGP/170] 13:14:03, localpref 100
                      AS path: 12345 I, validation-state: unverified
                    > to 169.254.10.2 via ge-0/0/2.0
140.10.20.0/24     *[BGP/170] 13:14:03, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.8 via ge-0/0/0.0, Push 16, Push 299776(top)
169.254.10.2/31    *[Direct/0] 15:41:25
                    > via ge-0/0/2.0
169.254.10.3/32    *[Local/0] 15:41:25
                      Local via ge-0/0/2.0

[email protected]> 

If we look at the customer A routing instance on the PE (vMX5) we also see that it has one route for 140.10.20.0/24. So now we should start asking ourselves why we don’t see two paths to reach the anycast prefix. One from vMX1 and one from vMX3. So let’s go one more router upstream and check to see what the route reflector is seeing…

[email protected]> show route table bgp.l3vpn.0 

bgp.l3vpn.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

65000:1:140.10.20.0/24                
                   *[BGP/170] 16:01:19, localpref 100, from 1.1.1.1
                      AS path: I, validation-state: unverified
                    > to 169.254.10.15 via ge-0/0/2.0, Push 16, Push 299776(top)
                    [BGP/170] 16:01:11, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 169.254.10.17 via ge-0/0/1.0, Push 16, Push 299808(top)
65000:12345:10.10.20.0/24                
                   *[BGP/170] 00:01:33, localpref 100, from 5.5.5.5
                      AS path: 12345 I, validation-state: unverified
                    > to 169.254.10.13 via ge-0/0/0.0, Push 16
65000:12345:169.254.10.2/31                
                   *[BGP/170] 00:01:38, localpref 100, from 5.5.5.5
                      AS path: I, validation-state: unverified
                    > to 169.254.10.13 via ge-0/0/0.0, Push 16

[email protected]> 

On the route reflector we can see that it does have two paths to 140.10.20.0/24. One from 1.1.1.1 and one from 3.3.3.3. However, note that they are from the same VPNv4 prefix 65000:1:140.10.20.0/24. So the route reflector will pick its best route (VPNv4 route in this case) and advertise that to the other BGP peers. Currently it’s decided that it prefers the route to 1.1.1.1. We can validate that this is the prefix which is being advertised to other peers…

[email protected]> show route advertising-protocol bgp 5.5.5.5 

bgp.l3vpn.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
  65000:1:140.10.20.0/24                    
*                         1.1.1.1                      100        I

[email protected]> 

So at this point, traffic coming from the customer CE will only ever take the path over to the anycast prefix advertised off of vMX1. We can validate this by doing captures on endpoints 1 and 3 and running a series of tests from endpoint 2. To do this we’ll use a set of nc commands with varying source ports from endpoint 2 toward the anycast address…

nc -p 1111 140.10.20.100 1234
nc -p 2222 140.10.20.100 1234
nc -p 3333 140.10.20.100 1234
nc -p 4444 140.10.20.100 1234
nc -p 5555 140.10.20.100 1234
nc -p 6666 140.10.20.100 1234
nc -p 7777 140.10.20.100 1234
nc -p 8888 140.10.20.100 1234
nc -p 9999 140.10.20.100 1234

If we were to look at the capture from endpoint 3, we’ll see no traffic because its all landing on endpoint 1…

root@endpoint1:/root# tcpdump -lnne -i eth0 port 1234
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.1111 > 140.10.20.100.1234: Flags [S], seq 2688396495, win 29200, options [mss 1460,nop,nop,sackOK,no
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.1111: Flags [R.], seq 0, ack 3203247241, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.2222 > 140.10.20.100.1234: Flags [S], seq 343620096, win 29200, options [mss 1460,nop,nop,sackOK,nop
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.2222: Flags [R.], seq 0, ack 1813536486, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.3333 > 140.10.20.100.1234: Flags [S], seq 491322140, win 29200, options [mss 1460,nop,nop,sackOK,nop
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.3333: Flags [R.], seq 0, ack 3203310456, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.4444 > 140.10.20.100.1234: Flags [S], seq 2964237456, win 29200, options [mss 1460,nop,nop,sackOK,no
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.4444: Flags [R.], seq 0, ack 1813285690, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.5555 > 140.10.20.100.1234: Flags [S], seq 3367347745, win 29200, options [mss 1460,nop,nop,sackOK,no
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.5555: Flags [R.], seq 0, ack 1813162985, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.6666 > 140.10.20.100.1234: Flags [S], seq 22142917, win 29200, options [mss 1460,nop,nop,sackOK,nop,
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.6666: Flags [R.], seq 0, ack 1813223922, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.7777 > 140.10.20.100.1234: Flags [S], seq 1537454679, win 29200, options [mss 1460,nop,nop,sackOK,no
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.7777: Flags [R.], seq 0, ack 3203312363, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.8888 > 140.10.20.100.1234: Flags [S], seq 998689150, win 29200, options [mss 1460,nop,nop,sackOK,nop
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.8888: Flags [R.], seq 0, ack 1813286516, win 0, length 0
52:de:52:ef:01:01 > 66:85:71:96:e3:bb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.9999 > 140.10.20.100.1234: Flags [S], seq 3533233105, win 29200, options [mss 1460,nop,nop,sackOK,no
66:85:71:96:e3:bb > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.9999: Flags [R.], seq 0, ack 1813537850, win 0, length 0

So how do we fix this? Well one option (and the one I wish to pursue in this post) is to use different RDs. In my previous posts, we’ve used a common deployment methodology where we aligned the RT with the RD for each VRF. This is what we’ve done in this post as well…

Provider RT -> 65000:1
Provider RD -> 65000:1

Customer A RT -> 65000:12345
Customer A RD -> 65000:12345

Note that for the provider side we’re also using 65000:2 as well as part of our import/export scheme as described above but otherwise we’re aligning the RT and the RD. The problem with this is that each network endpoint that’s advertising into a VRF is sending the same routes for the same prefixes. AKA – when the router creates a VPNv4 prefix from a IPv4 prefix they end up looking the same. This causes the router reflector to see the routes as the same. What we really want is for the route reflector to think they are different. The only way for them to look different is for us to use different RDs on vMX1 and vMX3. In doing so we’ll be changing our methodology for assigning RDs to a per router RD allocation in which RTs and RDs are decoupled entirely. In this model you can think of the RD as more of a “originator ID”. That is, since each router will have a unique RD you’ll always know where a route came from. So let’s change our RD on vMX3 to be 65000:3

set routing-instances provider route-distinguisher 65000:3

Once committed – let’s now look at the routes on the route reflector again…

[email protected]> show route table bgp.l3vpn.0                   

bgp.l3vpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

65000:1:140.10.20.0/24                
                   *[BGP/170] 16:13:48, localpref 100, from 1.1.1.1
                      AS path: I, validation-state: unverified
                    > to 169.254.10.15 via ge-0/0/2.0, Push 16, Push 299776(top)
65000:3:140.10.20.0/24                
                   *[BGP/170] 00:00:29, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 169.254.10.17 via ge-0/0/1.0, Push 16, Push 299808(top)
65000:12345:10.10.20.0/24                
                   *[BGP/170] 00:14:02, localpref 100, from 5.5.5.5
                      AS path: 12345 I, validation-state: unverified
                    > to 169.254.10.13 via ge-0/0/0.0, Push 16
65000:12345:169.254.10.2/31                
                   *[BGP/170] 00:14:07, localpref 100, from 5.5.5.5
                      AS path: I, validation-state: unverified
                    > to 169.254.10.13 via ge-0/0/0.0, Push 16

[email protected]> 

Nice! So now you can see we have two distinct VPNv4 routes from both vMX1 and vMX3. This also means that vMX7 will be advertising these two routes to vMX5…

[email protected]> show route advertising-protocol bgp 5.5.5.5    

bgp.l3vpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
  65000:1:140.10.20.0/24                    
*                         1.1.1.1                      100        I
  65000:3:140.10.20.0/24                    
*                         3.3.3.3                      100        I

[email protected]> 

Awesome. So now let’s look on vMX5…

[email protected]> show route table customer_a.inet.0    

customer_a.inet.0: 4 destinations, 5 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.10.20.0/24      *[BGP/170] 00:15:37, localpref 100
                      AS path: 12345 I, validation-state: unverified
                    > to 169.254.10.2 via ge-0/0/2.0
140.10.20.0/24     *[BGP/170] 00:15:42, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.8 via ge-0/0/0.0, Push 16, Push 299776(top)
                    [BGP/170] 00:02:04, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.11 via ge-0/0/1.0, Push 16, Push 299808(top)
169.254.10.2/31    *[Direct/0] 00:15:41
                    > via ge-0/0/2.0
169.254.10.3/32    *[Local/0] 00:15:41
                      Local via ge-0/0/2.0

[email protected]> 

Cool. So both routes are there – but if you look you’ll notice that the router is still only picking one that it wants to use (highlighted above). The other will only be selected in case the current primary becomes less preferred. Well this isn’t what we want – that’s not multipath at all. You might have read my previous post where we discussed multipath and the need for some additional flags to tell the router to use multiple paths. Specifically we have to enable multipath in BGP so that it knows to use multiple equal cost routes as well as tell the routers forwarding plane to actually use multiple routes.

Let’s tackle them in order. In previous posts we simply enabled multipath on the BGP group to tell BGP to use multiple equal cost paths. However, since we’re in a CE/PE scenario, the method for enabling multipath is actually different. Rather than enabling it in the BGP protocol, we do it in the routing options of the routing-instance as follows…

set routing-instances customer_a routing-options multipath

The result of this is a change in our routing table…

[email protected]> show route table customer_a.inet.0 

customer_a.inet.0: 4 destinations, 6 routes (4 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

10.10.20.0/24      *[BGP/170] 00:25:40, localpref 100
                      AS path: 12345 I, validation-state: unverified
                    > to 169.254.10.2 via ge-0/0/2.0
140.10.20.0/24     @[BGP/170] 00:25:45, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.8 via ge-0/0/0.0, Push 16, Push 299776(top)
                    [BGP/170] 00:12:07, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.11 via ge-0/0/1.0, Push 16, Push 299808(top)
                   #[Multipath/255] 00:01:22, metric2 1
                    > to 169.254.10.8 via ge-0/0/0.0, Push 16, Push 299776(top)
                      to 169.254.10.11 via ge-0/0/1.0, Push 16, Push 299808(top)
169.254.10.2/31    *[Direct/0] 00:25:44
                    > via ge-0/0/2.0
169.254.10.3/32    *[Local/0] 00:25:44
                      Local via ge-0/0/2.0

[email protected]> 

Notice that we now have two sections underneath the 140.10.20.0/24 route. One demarcated by a @ and one by a #. If we look at the key above we’ll see that the @ signifies “Routing use only” while the # denotes “Forwarding use only”. But why the differentiation?

Let’s take a step back here for a moment and think about what’s going on. Since vMX5 is the PE for customer A it has a couple of jobs…

  • Receive IPv4 routes from the customer A CE router, turn them into VPNv4 routes, and send them into the backbone.
  • Receive VPNv4 routes from the backbone and import the ones that match the customer A import policy in the customer A routing table as IPv4 prefixes.

Our problem now is that while our two routes are now unique VPNv4 routes (since we changed the RD on vMX3) they are not unique IPv4 routes. Since the vMX5 is a PE it has the special knowledge required to understand VPNv4 prefixes which means that it can still treat the routes as unique. But when it tries to put them into the customer A routing table, they can no longer be unique since they are duplicate IPv4 prefixes. So what happens is the router turns both VPNv4 prefixes into multipath forwarding destinations when it imports the VPNv4 routes into the customer A routing table. We can see this as part of this entry…

                   #[Multipath/255] 00:01:22, metric2 1
                    > to 169.254.10.8 via ge-0/0/0.0, Push 16, Push 299776(top)
                      to 169.254.10.11 via ge-0/0/1.0, Push 16, Push 299808(top)

But since we’re leaving the backbone we’re also losing our ability to comprehend VPNv4 prefixes. In doing so, we can no longer advertise 2 unique prefixes to the CE, we have to pick one which is what we see happening in the “Routing Use Only” blocks…

140.10.20.0/24     @[BGP/170] 00:25:45, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.8 via ge-0/0/0.0, Push 16, Push 299776(top)
                    [BGP/170] 00:12:07, localpref 100, from 7.7.7.7
                      AS path: I, validation-state: unverified
                    > to 169.254.10.11 via ge-0/0/1.0, Push 16, Push 299808(top)

So vMX5 will need to pick which prefix it thinks is best and only advertise that prefix to the CE. In this case, it’s preferring the path to VMX1, but that’s sort of hard to tell at this point…

[email protected]> show route advertising-protocol bgp 169.254.10.2 extensive 

customer_a.inet.0: 4 destinations, 6 routes (4 active, 0 holddown, 0 hidden)
@ 140.10.20.0/24 (3 entries, 2 announced)
 BGP group customer_a type External
     Nexthop: Self
     AS path: [65000] I 
     Communities: target:65000:2

[email protected]> 

Nothing about the above output shows us that this is the route from vMX1. If we’re crafty though, we can tack on a dummy community to the route coming from vMX1 so we can be sure. To do that, apply this configuration on vMX1…

set policy-options community dummy members 12:34
set policy-options policy-statement provider_vrf_export term all then community add dummy

Since communities are transitive properties this new dummy community will show up when we advertise the route to the CE…

[email protected]> show route advertising-protocol bgp 169.254.10.2 extensive    

customer_a.inet.0: 4 destinations, 6 routes (4 active, 0 holddown, 0 hidden)
@ 140.10.20.0/24 (3 entries, 2 announced)
 BGP group customer_a type External
     Nexthop: Self
     AS path: [65000] I 
     Communities: 12:34 target:65000:2

[email protected]> 

And there we go. We can now tell that the route being sent to the CE is in fact the one coming from vMX1. But at this point, you might be wondering – “Should I care?”. And the answer to that is “not really”. The CE at this point only really cares about getting to the PE and by sending a route (regardless of source) for 140.10.20.0/24 to the CE we’re accomplishing that task. We only have one physical path in this case and we don’t even have an opportunity for multipathing until we get to vMX5. Ssince vMX5 is installing multiple forwarding entries in the customer A forwading table now, we should be all set.

But now we actually need to tell the MX to use both entries on the forwarding side. If we look at the forwarding table now, we’ll see that the router is still only using one path…

[email protected]> show route forwarding-table destination 140.10.20.100/24 table customer_a 
Routing table: customer_a.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
140.10.20.0/24     user     0                    indr  1048574     2
                              169.254.10.8      Push 16, Push 299776(top)      620     2 ge-0/0/0.0

[email protected]> 

To fix this, we need to tell the chassis to do per-packet load balancing. Let’s put this configuration on vMX5…

set policy-options policy-statement chassis_load_balance then load-balance per-packet
set routing-options forwarding-table export chassis_load_balance

Now if we check again we should see…

[email protected]> show route forwarding-table destination 140.10.20.100/24 table customer_a    
Routing table: customer_a.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
140.10.20.0/24     user     0                    ulst  1048579     2
                                                 indr  1048574     2
                              169.254.10.8      Push 16, Push 299776(top)      620     2 ge-0/0/0.0
                                                 indr  1048578     2
                              169.254.10.11     Push 16, Push 299808(top)      619     2 ge-0/0/1.0

[email protected]> 

Awesome! Now if we rerun out nc test from endpoint 2, we should see some traffic landing on both endpoint 1 and endpoint 3…

Capture from endpoint 1

root@endpoint1:/root# tcpdump -lnne -i eth0 port 1234
17:41:52.587420 52:de:52:ef:01:01 > 32:e7:96:7e:fd:51, ethertype IPv4 (0x0800), length 66: 10.10.20.100.2222 > 140.10.20.100.1234: Flags [S], seq 2108129438, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.587451 32:e7:96:7e:fd:51 > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.2222: Flags [R.], seq 0, ack 2108129439, win 0, length 0
17:41:52.707242 52:de:52:ef:01:01 > 32:e7:96:7e:fd:51, ethertype IPv4 (0x0800), length 66: 10.10.20.100.4444 > 140.10.20.100.1234: Flags [S], seq 434031490, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.707258 32:e7:96:7e:fd:51 > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.4444: Flags [R.], seq 0, ack 434031491, win 0, length 0
17:41:52.763343 52:de:52:ef:01:01 > 32:e7:96:7e:fd:51, ethertype IPv4 (0x0800), length 66: 10.10.20.100.5555 > 140.10.20.100.1234: Flags [S], seq 837264673, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.763358 32:e7:96:7e:fd:51 > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.5555: Flags [R.], seq 0, ack 837264674, win 0, length 0
17:41:52.807339 52:de:52:ef:01:01 > 32:e7:96:7e:fd:51, ethertype IPv4 (0x0800), length 66: 10.10.20.100.6666 > 140.10.20.100.1234: Flags [S], seq 1786717058, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.807352 32:e7:96:7e:fd:51 > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.6666: Flags [R.], seq 0, ack 1786717059, win 0, length 0
17:41:52.931398 52:de:52:ef:01:01 > 32:e7:96:7e:fd:51, ethertype IPv4 (0x0800), length 66: 10.10.20.100.8888 > 140.10.20.100.1234: Flags [S], seq 2763450330, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.931414 32:e7:96:7e:fd:51 > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.8888: Flags [R.], seq 0, ack 2763450331, win 0, length 0
17:41:52.987495 52:de:52:ef:01:01 > 32:e7:96:7e:fd:51, ethertype IPv4 (0x0800), length 66: 10.10.20.100.9999 > 140.10.20.100.1234: Flags [S], seq 996712757, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.987509 32:e7:96:7e:fd:51 > 52:de:52:ef:01:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.9999: Flags [R.], seq 0, ack 996712758, win 0, length 0

Capture from endpoint 3

root@endpoint3:/root# tcpdump -lnne -i eth0 port 1234
17:41:52.535729 52:de:52:ef:03:01 > 3e:91:be:16:9f:cb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.1111 > 140.10.20.100.1234: Flags [S], seq 158129505, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.535763 3e:91:be:16:9f:cb > 52:de:52:ef:03:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.1111: Flags [R.], seq 0, ack 158129506, win 0, length 0
17:41:52.643348 52:de:52:ef:03:01 > 3e:91:be:16:9f:cb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.3333 > 140.10.20.100.1234: Flags [S], seq 2255895776, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.643363 3e:91:be:16:9f:cb > 52:de:52:ef:03:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.3333: Flags [R.], seq 0, ack 2255895777, win 0, length 0
17:41:52.871287 52:de:52:ef:03:01 > 3e:91:be:16:9f:cb, ethertype IPv4 (0x0800), length 66: 10.10.20.100.7777 > 140.10.20.100.1234: Flags [S], seq 3302152700, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:41:52.871302 3e:91:be:16:9f:cb > 52:de:52:ef:03:01, ethertype IPv4 (0x0800), length 54: 140.10.20.100.1234 > 10.10.20.100.7777: Flags [R.], seq 0, ack 3302152701, win 0, length 0

Great! So we are seeing some load balancing between the 2 anycast provider PEs.

So we took a long road to get here, covering some other topics along the way, but the main point I wanted to emphasize was how you can make VPNv4 prefixes unique in a provider backbone. This should reinforce the point from the last post that RTs and RDs serve two totally different purposes and how you allocate them can make a big difference in how the network behaves. In the next post, we’ll keep digging down into the details of how some of these features work and talk about alternative architectures.

1 thought on “Multipath routing with VPNv4

Leave a Reply

Your email address will not be published. Required fields are marked *