Working with Linux VRFs

The concept of VRFs is likely one that you’re familiar with. They are the de facto standard when we talk about isolating layer 3 networks. As we’ve talked about previously, they are used extensively in applications such as MPLS VPNs and really provide the foundation for layer 3 network isolation. They do this by allowing the creation of multiple routing tables. Any layer 3 construct can then be mapped into the VRF. For instance, I could assign an IP address to an interface and then map that interface into the VRF. Likewise, I could configure a static route and specify that the route is part of a given VRF. Going one step further I could establish a BGP session off of one of the VRF interfaces and receive remote BGP routes into the VRF. VRFs are to layer 3 like VLANs are to layer 2.

So while we’ve talked about how they are typically used and implemented on networking hardware like routers and switches – we haven’t talked about how they’re implemented in Linux. Actually – they’re fairly new to the Linux space. The functionality was actually written by Cumulus Networks and then contributed to the Linux kernel (kudos to them for doing that). VRFs were introduced in kernel version 4.3 which I believe came to be around 2015. That said – they were, in my opinion, a long time coming. So what did folks do before VRFs were introduced? Let’s talk about that for a moment.

There were really 2 options available to you to create a VRF like construct in Linux prior to VRFs landing. One option was to leverage multiple routing tables and policy routing to make something that looked VRF like. I tried this once – I wouldn’t advise it – the shortcomings far outweighed the advantages and I’d argue you’d have a heck of a hard time getting that solution past any sort of audit around network isolation. The far more popular option was to leverage network namespaces. This become incredibly popular when containers came on the scene and began using network namespaces to manage container isolation. But while network namespaces certainly work well – they are overkill for what we want. Cumulus talks about this at length in their article about VRFs so I won’t reiterate their points. However it is worth calling out the major drawback at least once. Network namespaces provide total and complete isolation for “all the things”. In our case, “all the things” includes the entire network stack – devices, interfaces, ARP tables, route tables, etc. Those of you familiar with how VRFs work know that’s not what we normally get with VRF functionality in a “normal” (I guess that’s what Im calling vendor provided) router. This is obvious to see if we look at an example VRF configuration on a normal router. For instance let’s look at a setup that looks like this…

Above we have a single router that has two interfaces. Interface ge-0/0/0 is a member of vrf vrf-1 while interface ge-0/0/1 is a member of vrf-2. We also have some static routes defined in each VRF. Pretty straight forward right? The configuration on the router might look something like this…

set interfaces ge-0/0/0 unit 0 family inet address 192.168.10.1/24
set interfaces ge-0/0/1 unit 0 family inet address 192.168.10.1/24

set routing-instances vrf-1 instance-type virtual-router
set routing-instances vrf-1 interface ge-0/0/0.0
set routing-instances vrf-1 routing-options static route 0.0.0.0/0 next-hop 192.168.10.254
set routing-instances vrf-1 routing-options static route 172.64.32.0/24 next-hop 192.168.10.101

set routing-instances vrf-2 instance-type virtual-router
set routing-instances vrf-2 interface ge-0/0/1.0
set routing-instances vrf-2 routing-options static route 0.0.0.0/0 next-hop 192.168.10.10
set routing-instances vrf-2 routing-options static route 192.168.128.0/24 next-hop 192.168.10.20

Note: In JunOS parlance a VRF that is local (Cisco used to call this VRF-lite) AKA one that is not used as part of a MPLS VPN and does not have a RD/RT assigned is of type “virtual-router” instead of type “vrf”.

Again – nothing crazy here in the configuration. But let’s take a look at the routing tables etc.

[email protected]> show route table vrf-1.inet.0    

vrf-1.inet.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[Static/5] 00:00:02
                    > to 192.168.10.254 via ge-0/0/0.0
172.64.32.0/24     *[Static/5] 00:00:02
                    > to 192.168.10.101 via ge-0/0/0.0
192.168.10.0/24    *[Direct/0] 00:01:10
                    > via ge-0/0/0.0
192.168.10.1/32    *[Local/0] 00:01:10
                      Local via ge-0/0/0.0

[email protected]> show route table vrf-2.inet.0    

vrf-2.inet.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[Static/5] 00:00:06
                    > to 192.168.10.10 via ge-0/0/1.0
192.168.10.0/24    *[Direct/0] 00:01:14
                    > via ge-0/0/1.0
192.168.10.1/32    *[Local/0] 00:01:14
                      Local via ge-0/0/1.0
192.168.128.0/24   *[Static/5] 00:00:06
                    > to 192.168.10.20 via ge-0/0/1.0

[email protected]>

The important thing to call out here is that I need to tell the router I want to look at a specific VRFs routing table. But what I don’t need to do is look inside the VRF context to see things that are common to the platform. The most obvious case of this is the configuration I pasted above. I have one global configuration that is used for all the VRFs. Sure, I map things into specific VRFs but I have a single pane of glass view into all of that. Even though my interfaces are now mapping into a VRF – I can still see them on the platform without any specific command to look inside the routing instance…

[email protected]> show interfaces terse | grep ge-0/0/ 
ge-0/0/0                up    up
ge-0/0/0.0              up    up   inet     192.168.10.1/24 
ge-0/0/1                up    up
ge-0/0/1.0              up    up   inet     192.168.10.1/24 
ge-0/0/2                up    down
ge-0/0/3                up    down
ge-0/0/4                up    down
ge-0/0/5                up    down
ge-0/0/6                up    down
ge-0/0/7                up    down
ge-0/0/8                up    down
ge-0/0/9                up    down

[email protected]>

And perhaps more importantly – services that run on the router that aren’t L3 aware can just run once on the platform as a whole. That is, I don’t need to run all of these services in all of the VRFs. A great example of this is something like LLDP. I’ve mapped my two ‘up’ interfaces into VRFs, but LLDP is still just running on the platform and I can still see all of the LLDP neighbors etc without having to run the software or command in a given VRF…

[email protected]> show lldp neighbors 
Local Interface    Parent Interface    Chassis Id          Port info          System Name
ge-0/0/1           -                   2c:6b:f5:4a:23:c0   ge-0/0/0           vmx3.lab            
ge-0/0/0           -                   2c:6b:f5:f9:08:c0   ge-0/0/1           vmx2.lab            

[email protected]>

If this still comes across as a weak argument to you, let’s consider the same scenario if we were to use network namespaces on Linux services to provide L3 isolation. Let’s say we have a Linux server that looks like this..

Configuring something like this might look like this…

ip netns add namespace-1
ip link set dev ens6 netns namespace-1
ip netns exec namespace-1 ip link set dev ens6 up
ip netns exec namespace-1 ip address add 192.168.10.1/24 dev ens6
ip netns exec namespace-1 ip route add 0.0.0.0/0 via 192.168.10.254
ip netns exec namespace-1 ip route add 172.64.32.0/24 via 192.168.10.101

ip netns add namespace-2
ip link set dev ens7 netns namespace-2
ip netns exec namespace-2 ip link set dev ens7 up
ip netns exec namespace-2 ip address add 192.168.10.1/24 dev ens7
ip netns exec namespace-2 ip route add 0.0.0.0/0 via 192.168.10.10
ip netns exec namespace-2 ip route add 192.168.128.0/24 via 192.168.10.20

Not too far off from where we were with the router. We configure the namespace, add an interface to it, and then add other routes inside the namespace. And we end up with something that looks pretty close to what we had before…

root@vm1:~# ip netns exec namespace-1 ip route show
default via 192.168.10.254 dev ens6 
172.64.32.0/24 via 192.168.10.101 dev ens6 
192.168.10.0/24 dev ens6 proto kernel scope link src 192.168.10.1 
root@vm1:~# ip netns exec namespace-2 ip route show
default via 192.168.10.10 dev ens7 
192.168.10.0/24 dev ens7 proto kernel scope link src 192.168.10.1 
192.168.128.0/24 via 192.168.10.20 dev ens7 
root@vm1:~#

Now here’s where things begin to diverge quickly. If I wanted to run something like LLDP on this server – how would I do that? Let’s start by installing it and seeing what it sees…

root@vm1:~# apt -y install lldpd
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  snmpd
The following NEW packages will be installed:
  lldpd
0 upgraded, 1 newly installed, 0 to remove and 126 not upgraded.
Need to get 0 B/154 kB of archives.
After this operation, 511 kB of additional disk space will be used.
Selecting previously unselected package lldpd.
(Reading database ... 107391 files and directories currently installed.)
Preparing to unpack .../lldpd_1.0.4-1build2_amd64.deb ...
Unpacking lldpd (1.0.4-1build2) ...
Setting up lldpd (1.0.4-1build2) ...
Processing triggers for libc-bin (2.31-0ubuntu9.1) ...
Processing triggers for systemd (245.4-4ubuntu3.3) ...
Processing triggers for man-db (2.9.1-1) ...
root@vm1:~# 
root@vm1:~# lldpcli show interfaces
-------------------------------------------------------------------------------
LLDP interfaces:
-------------------------------------------------------------------------------
Interface:    ens3, via: unknown, Time: 0 day, 00:19:09
  Chassis:     
    ChassisID:    mac 52:ab:54:ab:01:01
    SysName:      vm1
    SysDescr:     Ubuntu 20.04.1 LTS Linux 5.9.4-050904-generic #202011042130 SMP Wed Nov 4 21:36:10 UTC 2020 x86_64
    MgmtIP:       192.168.127.1
    MgmtIP:       fe80::50ab:54ff:feab:101
    Capability:   Bridge, off
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, on
  Port:        
    PortID:       mac 52:ab:54:ab:01:01
    PortDescr:    ens3
  TTL:          120
-------------------------------------------------------------------------------
root@vm1:~# lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
root@vm1:~#

Notice that it doesn’t detect ens6 or ens7, the two interfaces we moved into the namespaces. But this makes total sense. The interfaces are no longer in the global namespaces and simply can’t be seen. Again – compare this to our router where we could still see the interfaces – we had just isolated the L3 constructs. To get this to work, we’d actually need to run lldpd in each namespace. Since by default lldpd uses the same socket we’d need to create unique sockets for each instance. Something like this…

root@vm1:~# ip netns exec namespace-1 /usr/sbin/lldpd -u /var/run/lldpd-namespace-1.socket
root@vm1:~# ip netns exec namespace-1 lldpcli -u /var/run/lldpd-namespace-1.socket show interfaces
-------------------------------------------------------------------------------
LLDP interfaces:
-------------------------------------------------------------------------------
Interface:    ens6, via: unknown, Time: 18739 days, 20:45:55
  Chassis:     
    ChassisID:    mac 52:ab:54:cd:01:01
    SysName:      vm1
    SysDescr:     Ubuntu 20.04.1 LTS Linux 5.9.4-050904-generic #202011042130 SMP Wed Nov 4 21:36:10 UTC 2020 x86_64
    MgmtIP:       192.168.10.1
    MgmtIP:       fe80::50ab:54ff:fecd:101
    Capability:   Bridge, off
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, on
  Port:        
    PortID:       mac 52:ab:54:cd:01:01
    PortDescr:    ens6
  TTL:          120
-------------------------------------------------------------------------------
root@vm1:~# ip netns exec namespace-1 lldpcli -u /var/run/lldpd-namespace-1.socket show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    ens6, via: LLDP, RID: 1, Time: 0 day, 00:00:39
  Chassis:     
    ChassisID:    mac 52:ab:54:ab:02:01
    SysName:      vm2
    SysDescr:     Ubuntu 20.04.1 LTS Linux 5.9.4-050904-generic #202011042130 SMP Wed Nov 4 21:36:10 UTC 2020 x86_64
    MgmtIP:       192.168.127.2
    MgmtIP:       fe80::50ab:54ff:feab:201
    Capability:   Bridge, off
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, on
  Port:        
    PortID:       mac 52:ab:54:cd:02:02
    PortDescr:    ens7
    TTL:          120
-------------------------------------------------------------------------------
root@vm1:~#

So we can run another instance of lldpd in the namespace which allows us to see the local namespace interfaces and neighbors that are available out of those interfaces. The same could be done for namespace-2…

root@vm1:~# ip netns exec namespace-2 /usr/sbin/lldpd -u /var/run/lldpd-namespace-2.socket
root@vm1:~# ip netns exec namespace-2 lldpcli -u /var/run/lldpd-namespace-2.socket show interfaces
-------------------------------------------------------------------------------
LLDP interfaces:
-------------------------------------------------------------------------------
Interface:    ens7, via: unknown, Time: 18739 days, 20:47:14
  Chassis:     
    ChassisID:    mac 52:ab:54:cd:01:02
    SysName:      vm1
    SysDescr:     Ubuntu 20.04.1 LTS Linux 5.9.4-050904-generic #202011042130 SMP Wed Nov 4 21:36:10 UTC 2020 x86_64
    MgmtIP:       192.168.10.1
    MgmtIP:       fe80::50ab:54ff:fecd:102
    Capability:   Bridge, off
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, on
  Port:        
    PortID:       mac 52:ab:54:cd:01:02
    PortDescr:    ens7
  TTL:          120
-------------------------------------------------------------------------------
root@vm1:~# ip netns exec namespace-2 lldpcli -u /var/run/lldpd-namespace-2.socket show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    ens7, via: LLDP, RID: 1, Time: 0 day, 00:00:05
  Chassis:     
    ChassisID:    mac 52:ab:54:ab:03:01
    SysName:      vm3
    SysDescr:     Ubuntu 20.04.1 LTS Linux 5.9.4-050904-generic #202011042130 SMP Wed Nov 4 21:36:10 UTC 2020 x86_64
    MgmtIP:       192.168.127.3
    MgmtIP:       fe80::50ab:54ff:feab:301
    Capability:   Bridge, off
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, on
  Port:        
    PortID:       mac 52:ab:54:cd:03:02
    PortDescr:    ens7
    TTL:          120
-------------------------------------------------------------------------------
root@vm1:~#

Ok – but now we have 3 lldpd processes running…

root@vm1:~# ps -fC lldpd
UID          PID    PPID  C STIME TTY          TIME CMD
_lldpd      2304       1  0 12:53 ?        00:00:00 lldpd: monitor. 
_lldpd      2306    2304  0 12:53 ?        00:00:00 lldpd: no neighbor.
_lldpd      2501       1  0 12:53 ?        00:00:00 lldpd: monitor. 
_lldpd      2503    2501  0 12:53 ?        00:00:00 lldpd: connected to vm2.
_lldpd      2511       1  0 12:56 ?        00:00:00 lldpd: monitor. 
_lldpd      2513    2511  0 12:56 ?        00:00:00 lldpd: connected to vm3.
root@vm1:~#

There’s totally no need to isolate things that don’t need to be isolated – the only thing we gain by doing that is creating more overhead on the router. That said – I hope this shows that while namespaces do provide the necessary isolation – they are overkill for what we’re trying to achieve. As Cumulus said in their article “Network Namespace as a VRF? Just say No”.

So now that we’ve looked at the other possible options for solving this problem – let’s talk about using actual VRFs in Linux. So same setup we’ve done above – but now with VRFs!

The initial implementation of VRFs might initially feel a little different than what you’re used to – but once you start using it I’d argue it makes good sense. The nice thing about the VRF implementation in Linux is that it’s all done through existing tooling. That’s huge! So all we need is the existing iproute2 packages which in most cases should already be on your system. Once there – all you need to do to create a VRF is this…

root@vm1:~# ip link add vrf-1 type vrf table 1
root@vm1:~# ip link set dev vrf-1 up
root@vm1:~# ip vrf
Name              Table
-----------------------
vrf-1                1
root@vm1:~#

Easy! So now we have a VRF – what do with it? Well – we can add interfaces to the VRF quite easily…

root@vm1:~# ip link set dev ens6 master vrf-1
root@vm1:~# ip addr add 192.168.10.1/24 dev ens6
root@vm1:~# 
root@vm1:~# ip route show vrf vrf-1
192.168.127.0/24 dev ens6 proto kernel scope link src 192.168.127.1 
root@vm1:~#

Above we add ens6 to the VRF vrf-1 and then we add an IP address to it. Then we can look at the vrf-1 routing table using the ip route show vrf syntax. I’ll also point out that since the interface ens6 is now in the VRF it does not mean it’s gone from the global namespace as was the case with namespaces…

root@vm1:~# ip -br link
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
ens3             UP             52:ab:54:ab:01:01 <BROADCAST,MULTICAST,UP,LOWER_UP> 
ens6             UP             52:ab:54:cd:01:01 <BROADCAST,MULTICAST,UP,LOWER_UP> 
ens7             UP             52:ab:54:cd:01:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
vrf-1            UP             4e:19:81:5c:ad:ee <NOARP,MASTER,UP,LOWER_UP> 
root@vm1:~#

So now that we have an interface in the VRF – we can add static routes…

root@vm1:~# ip route add 0.0.0.0/0 via 192.168.127.254 vrf vrf-1
root@vm1:~# ip route add 172.64.32.0/24 via 192.168.127.101 vrf vrf-1
root@vm1:~# ip route show vrf vrf-1
default via 192.168.127.254 dev ens6 
172.64.32.0/24 via 192.168.127.101 dev ens6 
192.168.127.0/24 dev ens6 proto kernel scope link src 192.168.127.1 
root@vm1:~#

Let’s now configure vrf-2 all in one shot…

ip link add vrf-2 type vrf table 2
ip link set dev vrf-2 up
ip link set dev ens7 master vrf-2
ip addr add 192.168.127.1/24 dev ens7
ip route add 0.0.0.0/0 via 192.168.127.10 vrf vrf-2
ip route add 192.168.128.0/24 via 192.168.127.20 vrf vrf-2

So now that we have the full configuration – lets back up for a second. When we created the VRF – it looked a lot like we were creating another network device. In fact we were and the device that was created is sometimes called a “layer 3 master device” (l3mdev for short). This device acts like a sort of central point that you can connect all of the VRF interfaces to. The process of adding interfaces to the VRF might have looked familiar to you if you’ve worked with Linux bridges before. If so – it’s because it’s done in the exact same way! When they designed the VRF functionality they modeled it in the same way that you add devices to a bridge domain. The mapping even looks the same if we look at the interfaces…

root@vm1:~# ip -d link show dev ens6
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vrf-1 state UP mode DEFAULT group default qlen 1000
    link/ether 52:ab:54:cd:01:01 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    vrf_slave table 1 addrgenmode eui64 numtxqueues 2 numrxqueues 2 gso_max_size 65536 gso_max_segs 65535 
    altname enp0s6
root@vm1:~#

See how it shows that the master is vrf-1 and in the details we can the VRF table number as well? Not only does this make it feel more natural to folks already familiar with the concepts – but it also means that you can do some interesting things with the VRF device. For instance, what if you want to see traffic in your VRF? No problem – just do a TCPDUMP on the VRF interface…

root@vm1:~# tcpdump -nnel -i vrf-1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vrf-1, link-type EN10MB (Ethernet), capture size 262144 bytes
21:04:59.140107 42:f6:86:4a:a6:17 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 98: 192.168.10.1 > 192.168.10.254: ICMP echo request, id 2, seq 466, length 64
21:04:59.140575 52:ab:54:cd:02:02 > 52:ab:54:cd:01:01, ethertype IPv4 (0x0800), length 98: 192.168.10.254 > 192.168.10.1: ICMP echo reply, id 2, seq 466, length 64
21:05:00.164085 42:f6:86:4a:a6:17 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 98: 192.168.10.1 > 192.168.10.254: ICMP echo request, id 2, seq 467, length 64
21:05:00.164390 52:ab:54:cd:02:02 > 52:ab:54:cd:01:01, ethertype IPv4 (0x0800), length 98: 192.168.10.254 > 192.168.10.1: ICMP echo reply, id 2, seq 467, length 64
21:05:01.188086 42:f6:86:4a:a6:17 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 98: 192.168.10.1 > 192.168.10.254: ICMP echo request, id 2, seq 468, length 64
21:05:01.188320 52:ab:54:cd:02:02 > 52:ab:54:cd:01:01, ethertype IPv4 (0x0800), length 98: 192.168.10.254 > 192.168.10.1: ICMP echo reply, id 2, seq 468, length 64

Same thing works for ping – we can just set the source interface to the VRF device name and…

root@vm1:~# ping -I vrf-1 192.168.10.254
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.10.1 vrf-1: 56(84) bytes of data.
64 bytes from 192.168.10.254: icmp_seq=1 ttl=64 time=0.250 ms
64 bytes from 192.168.10.254: icmp_seq=2 ttl=64 time=0.253 ms
64 bytes from 192.168.10.254: icmp_seq=3 ttl=64 time=0.444 ms

We can also add IP addresses to the VRF interface itself…

root@vm1:~# ip addr add 1.1.1.1/32 dev vrf-1
root@vm1:~# ip route get vrf vrf-1 1.1.1.1
local 1.1.1.1 dev vrf-1 table 1 src 1.1.1.1 uid 0 
    cache <local> 
root@vm1:~#

This IP can be used as you would a loopback interface in the VRF.

So at this point we’ve shown how you can create and use VRFs – but we haven’t really talked about how we actually land traffic into the VRF. We know that when we created the VRF we had to assign it a routing table number. When dealing with routing tables we generally need to make some rule sets to handle them so let’s take a look at the existing IP rules…

root@vm1:~# ip rule show
0:	from all lookup local
1000:	from all lookup [l3mdev-table]
32766:	from all lookup main
32767:	from all lookup default
root@vm1:~#

Notice rule 1000. Now look at a different box that you haven’t configured any VRFs on…

root@vm2:~# ip rule show
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default
root@vm2:~#

Notice that rule 1000 isn’t present in the box with no VRFs defined. Rule 1000 is created when a VRF is configured and it’s what allows the VRF lookups to magically work. But let’s dig in here for a minute. What does the IP rule set on vm1 actually do? More importantly, what does rule 0 do that is listed before our magic VRF lookup rule?

Rule 0 exists by default everywhere and provides local lookups. So what does that mean?

root@vm1:~# ip route show table local
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 
broadcast 192.168.127.0 dev ens3 proto kernel scope link src 192.168.127.1 
local 192.168.127.1 dev ens3 proto kernel scope host src 192.168.127.1 
broadcast 192.168.127.255 dev ens3 proto kernel scope link src 192.168.127.1 
root@vm1:~#

This table includes all of the local lookups. That is – routes for directly connected or local interfaces. You’ll note a lack of any routes for prefixes or interfaces that exist in either of our VRFs. Makes sense – VRFs are a layer 3 construct so we wouldn’t want routes , even local ones, for the VRF in a table they don’t belong. The catch here though is how this rule processing happens. We have a total of 4 rules…

Rule 0 – Local Lookups
Rule 1000 – L3MDEV (Our magic VRF lookup rule)
Rule 32766 – The main table lookup (default table)
Rule 32767 – The “default” table lookup

Rules are processed in order lower to higher. So any time we need to do a route lookup – this processing happens until we find a match regardless of where the lookup is happening. This is why it’s important that the kernel creates rule 1000 to do VRF lookups before we do a lookup in the main table. Some of you might already be seeing a problem with this – but let’s table that for now while we talk about how rule 1000 works.

So what would happen if rule 1000 wasn’t there? Currently if we do a ping in our VRF out of one our VRF interfaces this works…

root@vm1:~# ping -I vrf-1 192.168.10.254
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.10.1 vrf-1: 56(84) bytes of data.
64 bytes from 192.168.10.254: icmp_seq=1 ttl=64 time=0.437 ms
64 bytes from 192.168.10.254: icmp_seq=2 ttl=64 time=0.260 ms
64 bytes from 192.168.10.254: icmp_seq=3 ttl=64 time=0.186 ms

Now let’s delete rule 1000 and try again…

root@vm1:~# ip rule del pref 1000
root@vm1:~# ping -I vrf-1 192.168.10.254 -c 2
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.127.1 vrf-1: 56(84) bytes of data.

--- 192.168.10.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1017ms

root@vm1:~#

No dice. So let’s do a route lookup in the VRF and see what’s happening…

root@vm1:~# ip route get vrf vrf-1 192.168.10.254
192.168.10.254 via 192.168.127.100 dev ens3 src 192.168.127.1 uid 0 
    cache 
root@vm1:~# ip route
default via 192.168.127.100 dev ens3 proto static 
10.20.30.0/24 via 192.168.127.100 dev ens3 proto static 
192.168.127.0/24 dev ens3 proto kernel scope link src 192.168.127.1 
root@vm1:~#

So as expected – the route lookup is falling into the main table and coming back with a route that isn’t in our VRF. Rule 1000, the l3mdev lookup rule, solved this for us by telling the kernel to always do VRF route table lookups for us. However – rule 1000 is strictly a convenience that covers all of the VRF table lookups for us. If we wanted to we could manually add the rules for a given VRF as well…

root@vm1:~# ip rule add oif vrf-1 table 1
root@vm1:~# ip rule add iif vrf-1 table 1
root@vm1:~# ip route get vrf vrf-1 192.168.10.254
192.168.10.254 dev ens6 table 1 src 192.168.10.1 uid 0 
    cache 
root@vm1:~# ping -I vrf-1 192.168.10.254 -c 2
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.10.1 vrf-1: 56(84) bytes of data.
64 bytes from 192.168.10.254: icmp_seq=1 ttl=64 time=0.447 ms
64 bytes from 192.168.10.254: icmp_seq=2 ttl=64 time=0.275 ms

--- 192.168.10.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 0.275/0.361/0.447/0.086 ms
root@vm1:~#

Above we add two rules that say “if out of interface vrf1 then lookup in table 1″ as well as “if in interface vrf-1 then lookup in table 1″. So while you could certainly add these rules for each VRF you create it’s easier to just let the default l3mdev rule take care of this for you. When you have a rule that points the lookup to l3mdev-table it covers the bases for all of the l3mdev (VRF interfaces). So let’s go ahead and put that rule back and clean up our other two rules…

root@vm1:~# ip rule show
0:	from all lookup local
32764:	from all iif vrf-1 lookup 1
32765:	from all oif vrf-1 lookup 1
32766:	from all lookup main
32767:	from all lookup default
root@vm1:~# ip rule del pref 32764
root@vm1:~# ip rule del pref 32765
root@vm1:~# ip rule add l3mdev pref 1000
root@vm1:~# ip rule show
0:	from all lookup local
1000:	from all lookup [l3mdev-table]
32766:	from all lookup main
32767:	from all lookup default
root@vm1:~#

Now this all seems to be working as expected – but let’s get back to the possible issues I hinted at earlier. As I mentioned, the rule lookup happens from lower to higher preference. So the first rule that will be evaluated will be rule 0. Remember – rule 0 is our table for all of the local or directly connected interfaces. So what happens when our VRF has a route it’s trying to reach which is also directly connected say in the global table?…

root@vm1:~# ping -I vrf-1 192.168.10.254 -c 2
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.10.1 vrf-1: 56(84) bytes of data.
64 bytes from 192.168.10.254: icmp_seq=1 ttl=64 time=0.368 ms
64 bytes from 192.168.10.254: icmp_seq=2 ttl=64 time=0.322 ms

--- 192.168.10.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1031ms
rtt min/avg/max/mdev = 0.322/0.345/0.368/0.023 ms
root@vm1:~# ip addr add 192.168.10.254/32 dev lo
root@vm1:~# ping -I vrf-1 192.168.10.254 -c 2
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.10.254 vrf-1: 56(84) bytes of data.

--- 192.168.10.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1030ms
root@vm1:~#

No worky. And if we check the routes…

root@vm1:~# ip route get vrf vrf-1 192.168.10.254
local 192.168.10.254 dev lo table local src 192.168.10.254 uid 0 
    cache <local> 
root@vm1:~#

As expected, the route lookup is getting caught in the table local lookup and not making it’s way to the l3mdev rule. To fix this – it’s usually recommended to reorder the rules like so…

root@vm1:~# ip -4 rule add pref 32765 table local
root@vm1:~# ip -4 rule del pref 0
root@vm1:~# ip rule show
1000:	from all lookup [l3mdev-table]
32765:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default
root@vm1:~# 
root@vm1:~# ping -I vrf-1 192.168.10.254 -c 2
ping: Warning: source address might be selected on device other than: vrf-1
PING 192.168.10.254 (192.168.10.254) from 192.168.10.1 vrf-1: 56(84) bytes of data.
64 bytes from 192.168.10.254: icmp_seq=1 ttl=64 time=0.564 ms
64 bytes from 192.168.10.254: icmp_seq=2 ttl=64 time=0.293 ms

--- 192.168.10.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1030ms
rtt min/avg/max/mdev = 0.293/0.428/0.564/0.135 ms
root@vm1:~#

This moves the local lookup to after rule 1000 so the VRF lookups can always occur first. Problem solved!

Now that we’ve gone through how they work, let’s talk about a couple of extra sort of handy commands to use when working with VRFs…

root@vm1:~# ip -br addr show master vrf-1
ens6             UP             192.168.10.1/24 fe80::50ab:54ff:fecd:101/64 
root@vm1:~#

If you want to see all of the IP interfaces in a given VRF you can use the show master version of the ip addr command to just list the addresses for a given VRF. Likewise to see all of the links in a given VRF you can use the same syntax with the ip link command…

root@vm1:~# ip link show master vrf-1
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vrf-1 state UP mode DEFAULT group default qlen 1000
    link/ether 52:ab:54:cd:01:01 brd ff:ff:ff:ff:ff:ff
    altname enp0s6
root@vm1:~#

This should give you a good idea on how to get up and running with Linux VRFs. If nothing else, it should be a good reminder as to why VRFs are valuable. They provide a super lightweight construct for L3 isolation while still allowing commons services to run at other layers which helps keep the overhead down.

9 thoughts on “Working with Linux VRFs”

Marius May 11, 2021 at 12:31 am

Great post , thanks !

Reply ↓
Pingback: Working with Linux VRFs « ipSpace.net blog - DzTechno - DZTECHNO!
Junhui May 31, 2021 at 10:20 pm

The biggest missing part of the Linux VRF is that the lacking of ability to point to a nexthop in another VRF.

Reply ↓
Pingback: Technology Short Take 141 - s0x
Faiz April 9, 2022 at 5:17 am

Thanks for the article. it’s really helpful for me. solving a couple of my cases.

Reply ↓
Adrian May 14, 2022 at 5:12 am

Hi,

You did a nice description for VRFs, but I notice your configuration for VRF vrf-1 is totally wrong:

root@vm1:~# ip link set dev ens6 master vrf-1
root@vm1:~# ip addr add 192.168.10.1/24 dev ens6
root@vm1:~#
root@vm1:~# ip route show vrf vrf-1
192.168.127.0/24 dev ens6 proto kernel scope link src 192.168.127.1
root@vm1:~#

You add address 192.168.10.1/24 on ens6 buy your connected route in route list is 192.168.127.0/24 with source 192.168.127.1!

Could you please recheck your configuration and corect the IP addresses? Can miss confusing a lot of people and they will not understand anything from your description.

Reply ↓
Michael Mathis June 9, 2022 at 10:58 pm

Thank you for writing this. very helpful.

Reply ↓
Romain Dubois October 19, 2022 at 11:15 am

Thank you for this magnificient post.
I have a quick question, you said “I tried this once – I wouldn’t advise it – the shortcomings far outweighed the advantages and I’d argue you’d have a heck of a hard time getting that solution past any sort of audit around network isolation”.

Could you please describe a little more what you have in mind ? As VRF is using also routing tables and IP rule, both implementation looks alike from my point of view.

Reply ↓
Michael Allemann April 29, 2023 at 6:22 pm

good stuff !

Reply ↓

Share this:

9 thoughts on “Working with Linux VRFs”

Leave a Reply Cancel reply