IPAM and DNS with CNI

      No Comments on IPAM and DNS with CNI

In the first post of this series we talked about some of the CNI basics.  We then followed that up with a second post showing a more real world example of how you could use CNI to network a container.  We’ve covered IPAM lightly at this point since CNI relies on it for IP allocation but we haven’t talked about what it’s doing or how it works.  In addition – DNS was discussed from a parameter perspective in the first post where we talked about the CNI spec but that’s about it.  The reason for that is that CNI doesn’t actually configure container DNS.  Confused?  I was too.  I mean why is it in the spec if I can’t configure it?

To answer these questions, and see how IPAM and DNS work with CNI, I think a deep dive into an actual CNI implementation would be helpful.  That is – let’s look at a tool that actually implements CNI to see how it uses it.  To do that we’re going to look at the container runtime from the folks at CoreOS – Rocket (rkt).  Rkt can be installed fairly easily using this set of commands…

wget https://github.com/coreos/rkt/releases/download/v1.25.0/rkt_1.25.0-1_amd64.deb
wget https://github.com/coreos/rkt/releases/download/v1.25.0/rkt_1.25.0-1_amd64.deb.asc
gpg --keyserver keys.gnupg.net --recv-key 18AD5014C99EF7E3BA5F6CE950BDD3E0FC8A365E
gpg --verify rkt_1.25.0-1_amd64.deb.asc
sudo dpkg -i rkt_1.25.0-1_amd64.deb

After you install rkt check to make sure it’s working…

user@ubuntu-1:~$ sudo rkt version
rkt Version: 1.25.0
appc Version: 0.8.10
Go Version: go1.7.4
Go OS/Arch: linux/amd64
Features: -TPM +SDJOURNAL
user@ubuntu-1:~$

Note: This post is not intended to be a ‘How to get started with rkt’ guide.  I might do something similar in the future but right now the focus is on CNI.

Great so now what? I mentioned above that rkt implements CNI. In other words, rkt uses CNI to configure a containers network interface.  Before we jump into that though – let’s talk about what’s already in place from the work we did in the first two posts. Let’s take a look at some files on the system to see what CNI has done up to this point…

user@ubuntu-1:~/cni$ sudo su
root@ubuntu-1:/home/user/cni# cd /var/lib/cni/networks
root@ubuntu-1:/var/lib/cni/networks# ls
mybridge
root@ubuntu-1:/var/lib/cni/networks#

Notice we switched over to the root user to make looking at these files easier. If we look in the ‘/var/lib/cni/networks’ path we should see a directory using the name of the network we defined. If you go back and look at the two previous posts you’ll notice that despite the networks being different – I neglected to change the name of the network between definitions. I only changed the ‘bridge’ parameter. If we look in the ‘mybridge’ folder we should see a few files…

root@ubuntu-1:/var/lib/cni/networks# cd mybridge/
root@ubuntu-1:/var/lib/cni/networks/mybridge# ls
10.15.20.2  10.15.30.100  last_reserved_ip
root@ubuntu-1:/var/lib/cni/networks/mybridge# more 10.15.20.2
1234567890
root@ubuntu-1:/var/lib/cni/networks/mybridge# more 10.15.30.100
1018026ebc02fa0cbf2be35325f4833ec1086cf6364c7b2cf17d80255d7d4a27
root@ubuntu-1:/var/lib/cni/networks/mybridge# more last_reserved_ip
10.15.30.100
root@ubuntu-1:/var/lib/cni/networks/mybridge#

Looking at the files we see some familiar values. The ‘10.15.20.2’ file has ‘1234567890’ in it which is the name of the network namespace from the first post. The ‘10.15.30.100’ file has the value of ‘1018026ebc02fa0cbf2be35325f4833ec1086cf6364c7b2cf17d80255d7d4a27’ which is the container ID we passed to CNI when we connected a Docker container with CNI in the second post. The last file is called ‘last_reserved_ip’ and has the value of 10.15.30.100 in it.  The last_reserved_ip file is sort of a helper file to tell CNI what the next IP is that it can allocate.  In this case, since the last IP was allocated out of the 10.15.30.0/24 network it lists that IP address.

So why are these files here?  Well they’re here because in both of the previous posts we told CNI to use the ‘host-local’ IPAM driver.  This is what host-local does, it stores all of the allocation locally on the host.  Pretty straight forward.  Let’s create another network definition on this host and use in conjunction with rkt so you can see it in action…

root@ubuntu-1:~# mkdir /etc/rkt/net.d
root@ubuntu-1:~# cd /etc/rkt/net.d
root@ubuntu-1:/etc/rkt/net.d#
root@ubuntu-1:/etc/rkt/net.d# cat > custom_rkt_bridge.conf <<"EOF"
> {
>     "cniVersion": "0.2.0",
>     "name": "customrktbridge",
>     "type": "bridge",
>     "bridge": "cni0",
>     "isGateway": true,
>     "ipMasq": true,
>     "ipam": {
>         "type": "host-local",
>         "subnet": "10.11.0.0/16",
>         "routes": [
>             { "dst": "0.0.0.0/0" }
>         ]
>     }
> }
> EOF
root@ubuntu-1:/etc/rkt/net.d#

The first thing we want to do is to create a new network definition.  In the previous posts, we were storing that in our ‘~/cni’ directory and passing it directly to the CNI plugin.  In this case, we want rkt to consume the configuration so we need to put it where rkt can find it.  In this case, the default directory rkt searches for network configuration files is ‘/etc/rkt/net.d/’.  So we’ll create the ‘net.d’ directory and then create this new network configuration in it.  Notice that the name of this network is ‘customrktbridge’.  Now let’s run a simple container on the host using rkt…

user@ubuntu-1:~$ sudo rkt run --interactive --net=customrktbridge quay.io/coreos/alpine-sh
pubkey: prefix: "quay.io/coreos/alpine-sh"
key: "https://quay.io/aci-signing-key"
gpg key fingerprint is: BFF3 13CD AA56 0B16 A898 7B8F 72AB F5F6 799D 33BC
 Quay.io ACI Converter (ACI conversion signing key) <[email protected]>
Are you sure you want to trust this key (yes/no)?
yes
Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/coreos/alpine-sh" after fingerprint review.
Added key for prefix "quay.io/coreos/alpine-sh" at "/etc/rkt/trustedkeys/prefix.d/quay.io/coreos/alpine-sh/bff313cdaa560b16a8987b8f72abf5f6799d33bc"
Downloading signature: [=======================================] 473 B/473 B
Downloading ACI: [=============================================] 2.65 MB/2.65 MB
image: signature verified:
 Quay.io ACI Converter (ACI conversion signing key) <[email protected]>
/ #
/ # ifconfig
eth0 Link encap:Ethernet HWaddr 62:5C:46:9F:57:3A
 inet addr:10.11.0.2 Bcast:0.0.0.0 Mask:255.255.0.0
 inet6 addr: fe80::605c:46ff:fe9f:573a/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:6 errors:0 dropped:0 overruns:0 frame:0
 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:508 (508.0 B) TX bytes:578 (578.0 B)

eth1 Link encap:Ethernet HWaddr A2:EE:49:17:03:EA
 inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0
 inet6 addr: fe80::a0ee:49ff:fe17:3ea/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:7 errors:0 dropped:0 overruns:0 frame:0
 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:578 (578.0 B) TX bytes:508 (508.0 B)

lo Link encap:Local Loopback
 inet addr:127.0.0.1 Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING MTU:65536 Metric:1
 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1
 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

/ #
/ # Container rkt-596c724a-f3de-4892-aebf-83529d0f386f terminated by signal KILL.
user@ubuntu-1:~$

To exit the containers interactive shell use the key sequence Ctrl + ]]]

The command we executed told rkt to run a container in interactive mode, using the network ‘customrktbridge’, from the container ‘quay.io/coreos/alpine-sh’.  Once the container was running we looked at it’s interfaces and found that in addition to a loopback interface, it also has a eth0 and eth1 interface.  Eth0 seems to line up with what we defined as part of our custom CNI network but what about eth1?  Well eth1 is an interface on what rkt refers to as the ‘default-restricted’ network.  This is one of the built in network types that rkt provides by default.  So now you’re wondering what rkt provides by default.  There are two networks that rkt defines by default.  They are the ‘default’ and the ‘default-restricted’ networks. As you might expect, the definitions for these networks are in CNI and you can take a look at them right here in the GitHub repo.  Let’s review them quickly so we can get an idea of what each provides…

{
	"cniVersion": "0.1.0",
	"name": "default",
	"type": "ptp",
	"ipMasq": true,
	"ipam": {
		"type": "host-local",
		"subnet": "172.16.28.0/24",
		"routes": [
			{ "dst": "0.0.0.0/0" }
		]
	}
}

The above CNI network definition describes the default network.  We can tell that this network uses the ‘ptp’ CNI driver, enables outbound masquerading, uses the host-local IPAM plugin, allocates container IPs from the 172.16.28.0/24 subnet, and installs a default route in the container.  Most of this seems pretty straight forward except for the ptp type.  That’s something we haven’t talked about yet but for now just know that it creates a VETH pair for each container.  One end lives on the host and the other lives in the container.  This is different from the default Docker model where the host side of the VETH pair goes into the docker0 bridge which acts as the container’s gateway.  In the ptp case, the host side VETH pairs are IP’d.  In fact – they’re IP’d using the same IP.  If you created multiple containers with rkt using the default network you’d see a bunch of VETH pair interfaces on the host all with 172.16.28.1/24.  In addition, you’d see routes for each container on the host pointing to the host side VETH pair for each destination IP in the container.

{
	"cniVersion": "0.1.0",
	"name": "default-restricted",
	"type": "ptp",
	"ipMasq": false,
	"ipam": {
		"type": "host-local",
		"subnet": "172.17.0.0/16"
	}
}

The above shows the CNI network definition for the default-restricted network which is what we saw in our output above.  We can tell this network uses the ptp CNI driver, disables out bound masquerading, uses the host-local IPAM plugin, and allocates container IPs out of the 172.17.0.0/16 subnet.  So the real question is why does our container have an interface on this network?  The answer lies in the docs (taken from here)…

The default-restricted network does not set up the default route and IP masquerading. It only allows communication with the host via the veth interface and thus enables the pod to communicate with the metadata service which runs on the host. If default is not among the specified networks, the default-restricted network will be added to the list of networks automatically. It can also be loaded directly by explicitly passing --net=default-restricted.

So that interfaces is put there intentionally for communication to the metadata service.  Again – this article isnt intended to be a deep drive on rkt networking – but I felt it was important to explain where all the container interfaces come from.  Ok – So now that we ran our container – let’s now go and look at our ‘/var/lib/cni/networks’ directory again…

user@ubuntu-1:~$ sudo su
[sudo] password for user:
root@ubuntu-1:/home/user# cd /var/lib/cni/networks/
root@ubuntu-1:/var/lib/cni/networks# ls
customrktbridge  default-restricted  mybridge
root@ubuntu-1:/var/lib/cni/networks#
root@ubuntu-1:/var/lib/cni/networks# cd customrktbridge/
root@ubuntu-1:/var/lib/cni/networks/customrktbridge# ls
10.11.0.2
root@ubuntu-1:/var/lib/cni/networks/customrktbridge# more 10.11.0.2
8d7152a7-9c53-48d8-859e-c8469d5adbdb
root@ubuntu-1:/var/lib/cni/networks/customrktbridge# cd ..
root@ubuntu-1:/var/lib/cni/networks# cd default-restricted/
root@ubuntu-1:/var/lib/cni/networks/default-restricted# ls
172.17.0.2
root@ubuntu-1:/var/lib/cni/networks/default-restricted# more 172.17.0.2
8d7152a7-9c53-48d8-859e-c8469d5adbdb
root@ubuntu-1:/var/lib/cni/networks/default-restricted#

This is what I’d expect to see. Rkt launched a container using CNI that ended up having two interfaces. One of which was the ‘customrktnetwork’ we defined and the other was the ‘default-restricted’ network that rkt connected for us by default. Since both plugins use the host-local IPAM driver they both got folders in ‘/var/lib/cni/networks’/ and they both have entries showing the assigned IP address as well and the container ID.

If you did a ‘sudo rkt list –full’ you’d see the full container ID which is ‘8d7152a7-9c53-48d8-859e-c8469d5adbdb’

At this point – we’ve shown how rkt uses CNI to provision container networks and how the host-local IPAM driver stores that information on the host locally.  You might now be wondering if there are other options for IPAM (I know I was).  If so – you’re in luck because by default, CNI also comes with the DHCP IPAM plugin.  So let’s take a look at a custom CNI network definition that uses DHCP for IPAM…

user@ubuntu-1:/etc/rkt/net.d$ sudo su
root@ubuntu-1:/etc/rkt/net.d# cd /etc/rkt/net.d
root@ubuntu-1:/etc/rkt/net.d#
root@ubuntu-1:/etc/rkt/net.d# cat > custom_rkt_bridge_dhcp.conf <<"EOF" > {
>     "cniVersion": "0.2.0",
>     "name": "customrktbridgedhcp",
>     "type": "macvlan",
>     "master": "ens32",
>     "ipam": {
>         "type": "dhcp"
>     }
> }
> EOF
root@ubuntu-1:/etc/rkt/net.d# exit
exit
user@ubuntu-1:/etc/rkt/net.d$t

There are again some new things in this CNI network definition. Namely – you should see that the type of this network is being defined as MacVLAN. In order to use an external DHCP service we need to get the containers network interface right onto the physical network. The easiest way to do this is to use MacVLAN which will put the containers interface directly onto the host network. This isn’t a post on MacVLAN so I’ll be leaving the details of how that works out. For now just know that this works by using the hosts interface (in this case ens32) as the parent or master interface for the containers interface. You’ll also note that we are now using an IPAM type of dhcp rather than host-local. DHCP acts just the way you’d expect, it relies on an external DHCP server to get IP address information for the container. The only catch is that for this to work we need to run CNI’s DHCP daemon to allow the container to get a DHCP address. The DHCP process will act as a proxy between the client in the container and the DHCP service that’s preexisting on your network. If you’ve completed the first two posts in this series you already have that binary in your ~/cni directory. To test this we’ll need two SSH sessions to our server. In the first, we’ll start CNI’s DHCP binary…

user@ubuntu-1:~/cni$ cd ~/cni
user@ubuntu-1:~/cni$ sudo ./dhcp daemon

Since we’re just running the executable here the process will just hang until it needs to do something. In our second window, let’s start a new container using our new network definition…

user@ubuntu-1:~$ sudo rkt run --interactive --net=customrktbridgedhcp quay.io/coreos/alpine-sh
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 92:97:2C:B5:6A:B7
          inet addr:10.20.30.152  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::9097:2cff:feb5:6ab7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:936 (936.0 B)  TX bytes:1206 (1.1 KiB)

eth1      Link encap:Ethernet  HWaddr FE:55:51:EF:27:48
          inet addr:172.17.0.8  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::fc55:51ff:feef:2748/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:508 (508.0 B)  TX bytes:508 (508.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ #

In this case, my DHCP server is allocating IP addresses out of 10.20.30.0/24 so our container ended up with 10.20.30.152. However, if we check the routing table, we’ll see that the container does not have a default route (this seems like something that should work so I opened a GH issue on it here.  In other words – there’s a chance I’m doing this wrong but I don’t think I am)…

/ # ip route
10.20.30.0/24 dev eth0  src 10.20.30.152
172.17.0.0/16 via 172.17.0.1 dev eth1  src 172.17.0.8
172.17.0.1 dev eth1  src 172.17.0.8
/ #

My assumption was that this should have been added by the DHCP plugin and captured as a DHCP option but it was not. If we look back at our first window we can see that the DHCP daemon is working though…

user@ubuntu-1:~/cni$ sudo ./dhcp daemon
2017/03/02 22:41:40 6f4945b1-9a03-4b72-88e8-049c2a1e24ea/customrktbridgedhcp: acquiring lease
2017/03/02 22:41:40 Link "eth0" down. Attempting to set up
2017/03/02 22:41:40 network is down
2017/03/02 22:41:46 6f4945b1-9a03-4b72-88e8-049c2a1e24ea/customrktbridgedhcp: lease acquired, expiration is 2017-03-03 09:41:46.125345466 -0600 CST

So we can see how the DHCP plugin can work – but in it’s current state it doesn’t seem quite usable to me.   I will stress that the CNI plugins provided by default are meant to showcase the possibilities for what CNI can do. I don’t believe all of them are meant to be or are used in ‘production’. As we’ll see in later posts – other systems use CNI and write their own CNI compatible plugins.

So what about DNS? We haven’t touched on that yet. Do you recall from our first and second post that when we manually ran the CNI plugin we got a JSON return? Here’s a copy and paste from the first post of the output Im referring to…

{
    "ip4": {
        "ip": "10.15.20.2/24",
        "gateway": "10.15.20.1",
        "routes": [
            {
                "dst": "0.0.0.0/0"
            },
            {
                "dst": "1.1.1.1/32",
                "gw": "10.15.20.1"
            }
        ]
    },
    "dns": {}

See that empty DNS dictionary at the bottom? It’s empty because we were using the host-local IPAM driver which doesn’t currently support DNS. But what does supporting DNS even mean in the context of CNI? It doesnt mean what I thought it meant initially. My assumption was that I could pass DNS related parameters to CNI and have it install those settings (DNS name server, search domain, etc) in the container. That was an incorrect assumption. The DNS parameters are return parameters that CNI can pass to whatever invoked it. In the case of DHCP – you could see how that would be useful as CNI could return information it learned from the DHCP server back to rkt in order to configure DNS in the container. Unfortunately, both the default bundled IPAM drivers (host-local and DHCP) don’t currently support returning DNS related information which is why you see an empty DNS return in the CNI JSON response.  There is a current PR in the repo for adding this functionality to the DHCP plugin so if and when that happens we’ll revist it.

Next up we’re going to revisit another system that uses CNI (cough, Kubernetes, cough).

Leave a Reply

Your email address will not be published. Required fields are marked *