In the last 4 posts we’ve examined the fundamentals of Kubernetes networking…
Kubernetes networking 101 – Pods
Kubernetes networking 101 – Services
Kubernetes networking 101 – (Basic) External access into the cluster
Kubernetes Networking 101 – Ingress resources
My goal with these posts has been to focus on the primitives and to show how a Kubernetes cluster handles networking internally as well as how it interacts with the upstream or external network. Now that we’ve seen that, I want to dig into a networking plugin for Kubernetes – Calico. Calico is interesting to me as a network engineer because of wide variety of functionality that it offers. To start with though, we’re going to focus on a basic installation. To do that, I’ve updated my Ansible playbook for deploying Kubernetes to incorporate Calico. The playbook can be found here. If you’ve been following along up until this point, you have a couple of options.
- Rebuild the cluster – I emphasized when we started all this that the cluster should be designed exclusively for testing. Starting from scratch is always the best in my opinion if you’re looking to make sure you don’t have any lingering configuration. To do that you can follow the steps here up until it asks you to deploy the KubeDNS pods. You need to deploy Calico before deploying any pods to the cluster!
- Download and rerun the playbook – This should work as well but I’d encourage you to delete all existing pods before doing this (even the ones in the Kube-System namespace!). There are configuration changes that occur both on the master and the minion nodes so you’ll want to make sure that once the playbook is run that all the services have been restarted. The playbook should do that for you but if you’re having issues check there first.
Regardless of which path you choose, I’m going to assume from this point on that you have a fresh Kubernetes cluster which was deployed using my Ansible role. Using my Ansible role is not a requirement but it does some things for you which I’ll explain along the way so no worries if you aren’t using it. The goal of this post is to talk about Calico, the lab being used is just a detail if you want to follow along.
So now that we have our lab sorted out – let’s talk about deploying Calico. One of the nicest things about Calico is that it can be deployed through Kubernetes. Awesome! The recommended way to deploy it is to use the Calico manifest which they define over on their site under the Standard Hosted Installation directions. If you’re using my Ansible role, a slightly edited version of this manifest can be found on your master in /var/lib/kubernetes/pod_defs
. Let’s take a look at what it defines…
# Calico Version v2.1.5 # http://docs.projectcalico.org/v2.1/releases#v2.1.5 # This manifest includes the following component versions: # calico/node:v1.1.3 # calico/cni:v1.8.0 # calico/kube-policy-controller:v0.5.4 # This ConfigMap is used to configure a self-hosted Calico installation. kind: ConfigMap apiVersion: v1 metadata: name: calico-config namespace: kube-system data: # Configure this with the location of your etcd cluster. etcd_endpoints: "https://ubuntu-1:2379" # Configure the Calico backend to use. calico_backend: "bird" # The CNI network configuration to install on each node. cni_network_config: |- { "name": "k8s-pod-network", "type": "calico", "etcd_endpoints": "__ETCD_ENDPOINTS__", "etcd_key_file": "__ETCD_KEY_FILE__", "etcd_cert_file": "__ETCD_CERT_FILE__", "etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__", "log_level": "info", "ipam": { "type": "calico-ipam" }, "policy": { "type": "k8s", "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__", "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__" }, "kubernetes": { "kubeconfig": "__KUBECONFIG_FILEPATH__" } } # If you're using TLS enabled etcd uncomment the following. # You must also populate the Secret below with these files. etcd_ca: "/calico-secrets/etcd-ca" etcd_cert: "/calico-secrets/etcd-cert" etcd_key: "/calico-secrets/etcd-key" --- # The following contains k8s Secrets for use with a TLS enabled etcd cluster. # For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/ apiVersion: v1 kind: Secret type: Opaque metadata: name: calico-etcd-secrets namespace: kube-system data: # Populate the following files with etcd TLS configuration if desired, but leave blank if # not using TLS for etcd. # This self-hosted install expects three files with the following names. The values # should be base64 encoded strings of the entire contents of each file. etcd-key: PUT YOUR BASE64 ENCODED KEY HERE! etcd-cert: PUT YOUR BASE64 ENCODED CERT HERE! etcd-ca: PUT YOUR BASE64 ENCODED CA HERE! --- # This manifest installs the calico/node container, as well # as the Calico CNI plugins and network config on # each master and worker node in a Kubernetes cluster. kind: DaemonSet apiVersion: extensions/v1beta1 metadata: name: calico-node namespace: kube-system labels: k8s-app: calico-node spec: selector: matchLabels: k8s-app: calico-node template: metadata: labels: k8s-app: calico-node annotations: scheduler.alpha.kubernetes.io/critical-pod: '' scheduler.alpha.kubernetes.io/tolerations: | [{"key": "dedicated", "value": "master", "effect": "NoSchedule" }, {"key":"CriticalAddonsOnly", "operator":"Exists"}] spec: hostNetwork: true containers: # Runs calico/node container on each Kubernetes node. This # container programs network policy and routes on each # host. - name: calico-node image: quay.io/calico/node:v1.1.3 env: # The location of the Calico etcd cluster. - name: ETCD_ENDPOINTS valueFrom: configMapKeyRef: name: calico-config key: etcd_endpoints # Choose the backend to use. - name: CALICO_NETWORKING_BACKEND valueFrom: configMapKeyRef: name: calico-config key: calico_backend # Disable file logging so `kubectl logs` works. - name: CALICO_DISABLE_FILE_LOGGING value: "true" # Set Felix endpoint to host default action to ACCEPT. - name: FELIX_DEFAULTENDPOINTTOHOSTACTION value: "ACCEPT" # Configure the IP Pool from which Pod IPs will be chosen. - name: CALICO_IPV4POOL_CIDR value: "10.100.0.0/16" - name: CALICO_IPV4POOL_IPIP value: "always" # Disable IPv6 on Kubernetes. - name: FELIX_IPV6SUPPORT value: "false" # Set Felix logging to "info" - name: FELIX_LOGSEVERITYSCREEN value: "info" # Location of the CA certificate for etcd. - name: ETCD_CA_CERT_FILE valueFrom: configMapKeyRef: name: calico-config key: etcd_ca # Location of the client key for etcd. - name: ETCD_KEY_FILE valueFrom: configMapKeyRef: name: calico-config key: etcd_key # Location of the client certificate for etcd. - name: ETCD_CERT_FILE valueFrom: configMapKeyRef: name: calico-config key: etcd_cert # Auto-detect the BGP IP address. - name: IP value: "" securityContext: privileged: true resources: requests: cpu: 250m volumeMounts: - mountPath: /lib/modules name: lib-modules readOnly: true - mountPath: /var/run/calico name: var-run-calico readOnly: false - mountPath: /calico-secrets name: etcd-certs # This container installs the Calico CNI binaries # and CNI network config file on each node. - name: install-cni image: quay.io/calico/cni:v1.8.0 command: ["/install-cni.sh"] env: # The location of the Calico etcd cluster. - name: ETCD_ENDPOINTS valueFrom: configMapKeyRef: name: calico-config key: etcd_endpoints # The CNI network config to install on each node. - name: CNI_NETWORK_CONFIG valueFrom: configMapKeyRef: name: calico-config key: cni_network_config volumeMounts: - mountPath: /host/opt/cni/bin name: cni-bin-dir - mountPath: /host/etc/cni/net.d name: cni-net-dir - mountPath: /calico-secrets name: etcd-certs volumes: # Used by calico/node. - name: lib-modules hostPath: path: /lib/modules - name: var-run-calico hostPath: path: /var/run/calico # Used to install CNI. - name: cni-bin-dir hostPath: path: /opt/cni/bin - name: cni-net-dir hostPath: path: /etc/cni/net.d # Mount in the etcd TLS secrets. - name: etcd-certs secret: secretName: calico-etcd-secrets --- # This manifest deploys the Calico policy controller on Kubernetes. # See https://github.com/projectcalico/k8s-policy apiVersion: extensions/v1beta1 kind: Deployment metadata: name: calico-policy-controller namespace: kube-system labels: k8s-app: calico-policy annotations: scheduler.alpha.kubernetes.io/critical-pod: '' scheduler.alpha.kubernetes.io/tolerations: | [{"key": "dedicated", "value": "master", "effect": "NoSchedule" }, {"key":"CriticalAddonsOnly", "operator":"Exists"}] spec: # The policy controller can only have a single active instance. replicas: 1 strategy: type: Recreate template: metadata: name: calico-policy-controller namespace: kube-system labels: k8s-app: calico-policy spec: # The policy controller must run in the host network namespace so that # it isn't governed by policy that would prevent it from working. hostNetwork: true containers: - name: calico-policy-controller image: quay.io/calico/kube-policy-controller:v0.5.4 env: # The location of the Calico etcd cluster. - name: ETCD_ENDPOINTS valueFrom: configMapKeyRef: name: calico-config key: etcd_endpoints # Location of the CA certificate for etcd. - name: ETCD_CA_CERT_FILE valueFrom: configMapKeyRef: name: calico-config key: etcd_ca # Location of the client key for etcd. - name: ETCD_KEY_FILE valueFrom: configMapKeyRef: name: calico-config key: etcd_key # Location of the client certificate for etcd. - name: ETCD_CERT_FILE valueFrom: configMapKeyRef: name: calico-config key: etcd_cert # The location of the Kubernetes API. Use the default Kubernetes # service for API access. - name: K8S_API value: "https://kubernetes.default:443" # Since we're running in the host namespace and might not have KubeDNS # access, configure the container's /etc/hosts to resolve # kubernetes.default to the correct service clusterIP. - name: CONFIGURE_ETC_HOSTS value: "true" volumeMounts: # Mount in the etcd TLS secrets. - mountPath: /calico-secrets name: etcd-certs volumes: # Mount in the etcd TLS secrets. - name: etcd-certs secret: secretName: calico-etcd-secrets
That’s a lot so let’s walk through what the manifest defines. The first thing the manifest defines is a config-map that Calico uses to define high level parameters about the Calico installation. Calico relies on a ETCD key value store for some of it’s functions so this is where we define the location of that. In this case, I’m using the same one that I’m using for Kubernetes. Again – this is a lab – they don’t recommend you doing that in non-lab environments. So in my case, I point the etcd_endpoints
parameter to the host Ubuntu-1 on port 2379. Since we’re using cert based auth for ETCD I also need to tell Calico where the certs are for that. To do that you just need to un-comment lines 46-48 in the config-map. Do not change these values assuming you need to point that at a real file location on the host!
The second item the manifest defines is a Kubernetes secret which we populate with the ETCD TLS information if we’re using it. We are so we need to populate these fields (lines 46-48) with base 64 encoded versions of each of these items. Again – this is something that Ansible will do for you if you use my role. If not, you need to manually insert the values (I removed them from the file just to save space). We haven’t talked about secrets specifically but they are a means to share secret information with objects inside the Kubernetes cluster.
The third item the manifest defines is a daemon-set. Dameon-sets are a means to deploy a specific workload to every Kubernetes node or minion. So say I had a logging system that I wanted on each system. Deploying it as a daemon-set allows Kubernetes to manage that for me. If I join a new node to the cluster, Kubernetes will start the logging system on that node as well. So in this case, the daemon-set is for Calico and consists of two containers. The node container is the brains of the operation and what does most of the heavy lifting. This is also where we changed the CALICO_IPV4POOL_CIDR
parameter from the default to 10.100.0.0/16
. This is not required but I wanted to keep the pod IP addresses in that subnet for my lab. The install-cni container takes care of creating the correct CNI definitions on each host so that Kubernetes can consume Calico through a CNI plugin. Once it completes this task it goes to sleep and never wakes back up. We’ll talk more about the CNI definitions below.
The fourth and final piece of the manifest defines the Calico policy controller. We wont be talking about that piece of Calico in this post so just hold tight on that one for now.
So let’s deploy this file to the cluster…
user@ubuntu-1:~$ cd /var/lib/kubernetes/pod_defs/ user@ubuntu-1:/var/lib/kubernetes/pod_defs$ kubectl apply -f calico.yaml configmap "calico-config" created secret "calico-etcd-secrets" created daemonset "calico-node" created deployment "calico-policy-controller" created user@ubuntu-1:/var/lib/kubernetes/pod_defs$
Alright – now let’s run our net-test
pod again so we have a testing point…
user@ubuntu-1:~$ kubectl run net-test --image=jonlangemak/net_tools deployment "net-test" created user@ubuntu-1:~$
Once running let’s check and see what the pod received for networking.
user@ubuntu-1:~$ kubectl exec net-test-645963977-thv8m -- ip -d link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64 2: tunl0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0 ipip remote any local any ttl inherit nopmtudisc 4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 7e:7e:2f:f3:bb:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 veth addrgenmode eui64 user@ubuntu-1:~$ user@ubuntu-1:~$ kubectl exec net-test-645963977-thv8m ifconfig eth0 Link encap:Ethernet HWaddr 7e:7e:2f:f3:bb:30 inet addr:10.100.163.129 Bcast:0.0.0.0 Mask:255.255.255.255 inet6 addr: fe80::7c7e:2fff:fef3:bb30/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:22 errors:0 dropped:0 overruns:0 frame:0 TX packets:22 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1908 (1.9 KB) TX bytes:1908 (1.9 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) user@ubuntu-1:~$
First we notice that the eth0 interface is actually a VETH pair. We see that it’s peer is interface index 5 which on the host is an interface called cali182f84bfeba@if4
. So the container’s network namespace is connected back to the host using a VETH pair. This is very similar to how most container networking solutions work with one minor change. The host side VETH pair is not connected to a bridge. It just lives by itself in the default or root namespace. We’ll talk more about the implications of this later on in this post. Next we notice that the pod received an IP address of 10.100.163.129. This doesn’t seem unusual since that was our pod CIDR we had defined in previous labs, but if you look at the kube-controller-manager
service definition. You’l notice that we no longer configure that option…
user@ubuntu-1:~$ more /etc/systemd/system/kube-controller-manager.service [Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] ExecStart=/usr/bin/kube-controller-manager \ --allocate-node-cidrs=false \ --cluster-name=my_cluster \ --leader-elect=true \ --master=http://10.20.30.71:8080 \ --root-ca-file=/var/lib/kube_certs/ca.pem \ --service-account-private-key-file=/var/lib/kube_certs/kubernetes-key.pem \ --service-cluster-ip-range=10.11.12.0/24 \ --v=2 Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target user@ubuntu-1:~$
Notice that the --cluster-cidr
parameter is missing entirely and that the --allocate-node-cidrs
parameter has been changed to false
. This means that Kubernetes is no longer allocating pod CIDR networks to the nodes. So how are the pods getting IP addresses now? The answer to that lies in the kubelet configuration…
user@ubuntu-3:~$ more /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=docker.service Requires=docker.service [Service] ExecStart=/usr/bin/kubelet \ --allow-privileged=true \ --api-servers=https://10.20.30.71:6443 \ --cloud-provider= \ --cluster-dns=10.11.12.254 \ --cluster-domain=k8s.cluster.local \ --container-runtime=docker \ --docker=unix:///var/run/docker.sock \ --network-plugin=cni \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --reconcile-cidr=true \ --serialize-image-pulls=false \ --tls-cert-file=/var/lib/kube_certs/kubernetes.pem \ --tls-private-key-file=/var/lib/kube_certs/kubernetes-key.pem \ --v=2 Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target user@ubuntu-3:~$
Our --network-plugin
change from kubenet
to cni
. This means that we’re using native CNI in order to provision container networking. When doing so, Kubernetes acts as follows…
The CNI plugin is selected by passing Kubelet the –network-plugin=cni command-line option. Kubelet reads a file from –cni-conf-dir (default /etc/cni/net.d) and uses the CNI configuration from that file to set up each pod’s network. The CNI configuration file must match the CNI specification, and any required CNI plugins referenced by the configuration must be present in –cni-bin-dir (default /opt/cni/bin).
If there are multiple CNI configuration files in the directory, the first one in lexicographic order of file name is used.
In addition to the CNI plugin specified by the configuration file, Kubernetes requires the standard CNI lo plugin, at minimum version 0.2.0
Since we didnt specify --cni-conf-dir
or –-cni-bin-dir
the kubelet will look in the default path for each. So let’s checkout what’s in the --cni-conf-dir
(/etc/cni/net.d) now…
user@ubuntu-3:~$ cd /etc/cni/net.d user@ubuntu-3:/etc/cni/net.d$ ls 10-calico.conf calico-kubeconfig calico-tls user@ubuntu-3:/etc/cni/net.d$ user@ubuntu-3:/etc/cni/net.d$ more 10-calico.conf { "name": "k8s-pod-network", "type": "calico", "etcd_endpoints": "https://ubuntu-1:2379", "etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key", "etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert", "etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca", "log_level": "info", "ipam": { "type": "calico-ipam" }, "policy": { "type": "k8s", "k8s_api_root": "https://10.11.12.1:443", "k8s_auth_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW 8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLXRsaGhzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2Ut YWNjb3VudC51aWQiOiI3NTY4OTkwYy0zNDJjLTExZTctOGVjZS0wMDBjMjkzZTQ5NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.lP2R3p-RzOeo2h6kd2NvaTPcSppWXE4va2ZQEqqSx1y1UQBPvPuYBLx6X7EB20hkV kO5UEHJEBWdMOU5C7xKUAYBD7iqTtOaFlrsMnpn99bL2rJJoJV-UCRttcLVschvsl1sn8cl8xCGvhm94NevD1j5IXioqTeeBCumuPfk5N8W9nN4xfHnrC8bF0HON8CAcLurrK2ZpE6Z9TFxA9rBy803UxotLYPhT01iSALhLP-u43JX5Jj_fky-onLf6rK3bh4DRRCugom EFi5m6CRTklJQvKod1xfxf3DdV9J04cN3qV3Sh0snKM04omPObeJudhVCmpm5-ZevbhlN2LR2WQ" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } } user@ubuntu-3:/etc/cni/net.d$ more calico-kubeconfig # Kubeconfig file for Calico CNI plugin. apiVersion: v1 kind: Config clusters: - name: local cluster: insecure-skip-tls-verify: true users: - name: calico contexts: - name: calico-context context: cluster: local user: calico current-context: calico-context user@ubuntu-3:/etc/cni/net.d$
There’s quite a bit here and all of these files were written by Calico. Specifically by the install-cni container. We can verify that by checking it’s logs…
user@ubuntu-3:/etc/cni/net.d$ sudo docker logs d042855bb84a Installing any TLS assets from /calico-secrets Wrote Calico CNI binaries to /host/opt/cni/bin/ CNI plugin version: v1.8.0 Wrote CNI config: { "name": "k8s-pod-network", "type": "calico", "etcd_endpoints": "https://ubuntu-1:2379", "etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key", "etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert", "etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca", "log_level": "info", "ipam": { "type": "calico-ipam" }, "policy": { "type": "k8s", "k8s_api_root": "https://10.11.12.1:443", "k8s_auth_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLW01M3FxIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIyZjY1YjU3OC0zNTliLTExZTctODgwYS0wMDBjMjkzZTQ5NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.kowTGr92p6ICzHX48-bJu-0F6Re81J1Eq-kmQLZFYcXhIDaOCbKDtp-Jtc8LdQxd87zKyiqbnh8UCBP60hpxYBI8J90WGrwNqqmr6XXmIWNXfwIUyCo-XmUmxtFLD671eat4z8ya1nD4TeMK6Zu8rIsByRa6hpK_uUQdW7TES-DnOzwCV3Ogj_hZ67rDF4KEWj98SFq50osVYI6Mf8NYksSn-0iiygY3oFS8Ir82cUzGfLNx6sAkULOnUF5o31CYSClOILH_vPd3HFIIgy4X503VZpbEPRjF8UuWFYoDIhQePOtYrMjq9jlOqCP1NBhGhvlQ8N0P2ndS2DVt8oHuZQ" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } } Done configuring CNI. Sleep=true
As we can see from the log of the container on each host, the CNI container created the binaries if they didnt exist (these may have already existed if you were using the previous lab build). It then created the CNI policy and the associated kubeconfig file for CNI to use. It also created the /etc/cni/net.d/calico-tls
directory and placed the certs required to talk to etcd in that directory. It got this information from the Kubernetes secret /calico-secrets
which is really the information from the secret calico-etcd-secrets
that we created in the Calico manifest. The secret just happens to be mounted into the container as calico-secrets
. The CNI definition also specifies that a plugin of calico
should be use which we’ll find does exist in the /opt/cni/bin
directory. it also specifies an IPAM plugin of calico-ipam
meaning that calico is also taking care of our IP address assignment. One other interesting thing to point out is that the CNI definition lists the information required to talk to the Kubernetes API. To do this, it’s using the default pod token. If you’re curious how the pods get the token to talk to the API server check out this piece of documentation that talks about default service accounts and credentials in Kubernetes. Lastly – the install-CNI container created a kubeconfig file which specifies some further Kubernetes connectivity parameters.
So running the Calico manifest did quite a lot for us. Each node node has the Calico CNI plugins and the means to talk to the Kubernetes API. So now we know that Calico is driving the IP address allocation for the hosts, what about the actual networking side of things? Let’s take a closer look at the routing for net-test container…
user@ubuntu-1:~$ kubectl exec -it net-test-645963977-thv8m ip route default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link user@ubuntu-1:~$
Well this is strange. The default route is pointing at 169.254.1.1. Let’s look on the host this container is running on and see what interfaces exist…
user@ubuntu-3:~$ ifconfig cali182f84bfeba Link encap:Ethernet HWaddr 2e:7e:32:de:8c:a3 inet6 addr: fe80::2c7e:32ff:fede:8ca3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:648 (648.0 B) TX bytes:648 (648.0 B) docker0 Link encap:Ethernet HWaddr 02:42:17:5f:98:a6 inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ens32 Link encap:Ethernet HWaddr 00:0c:29:78:24:28 inet addr:10.20.30.73 Bcast:10.20.30.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe78:2428/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1725392 errors:0 dropped:0 overruns:0 frame:0 TX packets:1442228 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:759641266 (759.6 MB) TX bytes:133830278 (133.8 MB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:186 errors:0 dropped:0 overruns:0 frame:0 TX packets:186 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:13652 (13.6 KB) TX bytes:13652 (13.6 KB) tunl0 Link encap:IPIP Tunnel HWaddr inet addr:10.100.163.128 Mask:255.255.255.255 UP RUNNING NOARP MTU:1440 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) user@ubuntu-3:~$
Nothing matching that IP address here. So what’s going on? How can a container route at an IP that doesnt exist? Let’s walk through what’s happening. Some of you reading this might have noticed that 169.254.1.1 is an IPv4 link local address. The container has a default route pointing at a link local address meaning that the container expects this IP address to be reachable on it’s directly connected interface, in this case, the containers eth0 address. The container will attempt to ARP for that IP address when it wants to route out through the default route. Since our container hasnt talked to anything yet, we have the opportunity to attempt to capture it’s ARP request on the host. Let’s setup a TCPDUMP on the host ubuntu-3 and then use kubectl exec
on the master to try talking to the outside world…
user@ubuntu-1:~$ kubectl exec -it net-test-645963977-thv8m -- ping 4.2.2.2 -c 1 PING 4.2.2.2 (4.2.2.2): 56 data bytes 64 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=28.979 ms --- 4.2.2.2 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 28.979/28.979/28.979/0.000 ms user@ubuntu-1:~$ user@ubuntu-3:~$ sudo tcpdump -i cali182f84bfeba -nn tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on cali182f84bfeba, link-type EN10MB (Ethernet), capture size 262144 bytes 11:07:37.523892 ARP, Request who-has 169.254.1.1 tell 10.100.163.129, length 28 11:07:37.523917 ARP, Reply 169.254.1.1 is-at 2e:7e:32:de:8c:a3, length 28 11:07:37.523937 IP 10.100.163.129 > 4.2.2.2: ICMP echo request, id 230, seq 0, length 64 11:07:37.551884 IP 4.2.2.2 > 10.100.163.129: ICMP echo reply, id 230, seq 0, length 64 ^C 4 packets captured 4 packets received by filter 0 packets dropped by kernel user@ubuntu-3:~$
In the top output you can see we have the container send a single ping to 4.2.2.2. This will surely follow the container’s default route and cause it to ARP for it’s gateway at 169.254.1.1. In the bottom output you see the capture on the host Ubuntu-3. Notice we did the capture on the interface cali182f84bfeba
which is the host side of the VETH pair connecting the container back to the root or default network namespace on the host. In the output of the TCPDUMP we see the container with a source of 10.100.163.129
send an ARP request. The reply comes from 2e:7e:32:de:8c:a3
which, if we reference the above output, will see is the MAC address of the host side VETH pair cali182f84bfeba
. So you might be wondering how on earth the host is replying to an ARP request for which it doesn’t have an IP interface on. The answer is proxy-arp. If we check the host side VETH interface we’ll see that proxy-arp is enabled…
user@ubuntu-3:~$ cat /proc/sys/net/ipv4/conf/cali182f84bfeba/proxy_arp 1 user@ubuntu-3:~$
By enabling proxy-arp on this interface Calico is instructing the host to reply to the ARP request on behalf of someone else that is, through proxy. The rules for proxy-ARP are simple. A host which has proxy-ARP enabled will reply to ARP requests with it’s own MAC address when…
- The host receives an ARP request on an interface which has proxy-ARP enabled.
- The host knows how to reach the destination
- The interface the host would use to reach the destination is not the same one that it received the ARP request on
So in this case, the container is sending an ARP request for 169.254.1.1. Despite this being a link-local address, the host would attempt to route this following it’s default route out the hosts physical interface. This means we’ve met all three requirements so the host will reply to the ARP request with it’s MAC address.
Note: If you’re curious about these requirements go ahead and try them out yourself. For requirement 1 you can disable proxy-arp on the interface with echo 0 > /proc/sys/net/ipv4/conf/<interface name goes here>/proxy_arp.
For requirement 2 simply remove the hosts default route (make sure you have a 10’s route or some other means to reach the host before you do that!) like so sudo ip route del 0.0.0.0/0.
For the third requirement point the route 169.254.0.0/16 at the VETH interface itself like this sudo ip route add 169.254.0.0/16 dev <Calico VETH interface name>.
If you do any of these, the container will no longer be able to access the outside world. Part of me wonders if this makes it a bit fragile but I also assume that most hosts will have a default route.
The ARP process for the container would look like this…
In this case, the proxy ARP requirements are met since the host has a default route it can follow for the destination of 169.254.1.1 so it replies to the container with it’s own MAC address. At this point, the container believes it has a valid ARP entry for it’s default gateway and will start initiating normal traffic toward the host. It’s a pretty clever configuration but one that takes some time to understand.
I had mentioned above that the host side of the container VETH pair just lived in the hosts default or root namespace. In other container implementations, this interface would be attached to a common bridge so that all connected containers could talk to one another directly. In that scenario, the bridge would commonly be allocated an IP address giving the host an IP address on the same subnet as the containers. This would allow the host to talk (do things like ARP) to the container directly. Having the bridge also allows the containers themselves to talk directly to one another through the bridge. This describes a layer 2 scenario where the host and all containers attached to the bridge can ARP for each others IP addresses directly. Since we don’t have the bridge, we need to tell the host how to route to each container. If we look at the hosts routing table we’ll see that we have a /32 route for the IP of the our net-test container…
user@ubuntu-3:~$ ip route default via 10.20.30.1 dev ens32 onlink 10.20.30.0/24 dev ens32 proto kernel scope link src 10.20.30.73 10.100.5.192/26 via 192.168.50.74 dev tunl0 proto bird onlink 10.100.138.192/26 via 192.168.50.75 dev tunl0 proto bird onlink blackhole 10.100.163.128/26 proto bird 10.100.163.129 dev cali182f84bfeba scope link 10.100.243.0/26 via 10.20.30.72 dev tunl0 proto bird onlink 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown user@ubuntu-3:~$
The route points the IP address at the host side VETH pair. We also notice some other unusual routes in the hosts routing table…
user@ubuntu-3:~$ ip route default via 10.20.30.1 dev ens32 onlink 10.20.30.0/24 dev ens32 proto kernel scope link src 10.20.30.73 10.100.5.192/26 via 192.168.50.74 dev tunl0 proto bird onlink 10.100.138.192/26 via 192.168.50.75 dev tunl0 proto bird onlink blackhole 10.100.163.128/26 proto bird 10.100.163.129 dev cali182f84bfeba scope link 10.100.243.0/26 via 10.20.30.72 dev tunl0 proto bird onlink 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown user@ubuntu-3:~$
These routes are inserted by Calico and represent the subnets allocated by Calico to all of the other hosts in our Kubernetes cluster. We can see that Calico is allocating a /26 network to each host…
10.100.243.0/26 – Ubuntu-2
10.100.163.128/26 – Ubuntu-3
10.100.5.192/26 – Ubuntu-4
10.100.138.192/26 – Ubuntu-5
Notice that these destinations are reachable through the tunl0 interface which is Calico’s IPIP overlay transport tunnel. This means that we don’t need to tell the upstream or physical network how to get to each POD CIDR range since it’s being done in the overlay. This also means that we can no longer reach the pod IP address directly. This conforms more closely with what the Kubernetes documentation describes when it says that the pod networks are not routable externally. In our previous examples they were reachable since we were manually routing the subnets to each host.
We’ve just barely scratched the surface of Calico in this post but it should be enough to get you and running. In the next post we’ll talk about how Calico shares routing and reachability information between the hosts.
The 4 links listed on the top of this article won’t work. Gives back HTTP 404 error.
Fixed! Thanks for pointing that out
Hi, very enlightening article. I did deploy manually calico following the documentation on a simple cluster (1 master and 1 worker) and the strange issue I have is that when I fire up 2 net-test image, they can’t ping each other. Inter-pod communication om the same worker seems to be broken. In term of control plane I’ve both cali created and both proxy-arp has been enabled on these interface, also have the 2 routes pointing to dev scope link. My worker has multiple interface. I tried 4.2.2.2 but could not get any echo reply. Seems that this is a forwardign issue. Any idea/Hint where this issue should comes from. I suspect an iptables issues. (I have similar issue with kubenet and need to manually add chain on (cbr0: sudo iptables -A FORWARD -i cbr0 -j ACCEPT sudo iptables -A FORWARD -o cbr0 -j ACCEPT) would it be possible to dump your iptables ? Thanks anyway for ALL of these articles.
Sorry for the delay in replying. What do the route tables look like on the pods and on the host itself?
Thanks for your reply. I found the issue to this problem. Actually when you do a manual install (in order to understand the automatic installation) Everything is block by default by iptables because of the default drop network policy. In the documentation it is explicitely mentioned that all authorized traffic traffic must be explicitely allowed but this is unclear “how” to to do. At least for a calico newcomer. Once applied a fault profile per namespace (kube-system, default namespace) everything is back up up. In order to see if there is an iptables issues just issue iptbles-save and analyse packet “jump” sequences. I have an other issue now. I tweaked my installation so much that I don’t know why /26 routes and blackholes routes are not installed by bird container. Any idea where to look at this or how to troublshoot this. Calico log are not meaningful enough. (at least from my perspective) Thanks ! Bgrds, Frederic
Found the issue. Problem was that my minions had multiple interface and FELIX resolve using gethostbyname while mu calico-node was registered using IP adress of the interface. Setting FELIX_FELIXHOSTNAME solved the issue. When you have such behaviour just “recursively ls” etcd. Cheers, Frederic
I have tried to replicate your setup and am having trouble receiving ping responses. I believe I meet the three proxy arp requirements, but my traffic looks like this:
root@kube-worker-1:/# tcpdump -i cali2b450175527 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali2b450175527, link-type EN10MB (Ethernet), capture size 262144 bytes
18:34:15.327511 IP 10.192.6.153 > 10.32.19.50: ICMP echo request, id 95, seq 0, length 64
18:34:20.754962 ARP, Request who-has 169.254.1.1 tell 10.192.6.153, length 28
18:34:20.755006 ARP, Reply 169.254.1.1 is-at e6:d2:de:c8:5b:99, length 28
The routing table on the host looks like this:
default via 10.192.6.1 dev ens160 proto dhcp src 10.192.6.6 metric 1024
10.192.6.0/25 dev ens160 proto kernel scope link src 10.192.6.6
10.192.6.1 dev ens160 proto dhcp scope link src 10.192.6.6 metric 1024
blackhole 10.192.6.128/26 proto bird
10.192.6.142 dev califc066d97bf4 scope link
10.192.6.144 dev cali14d8083e67e scope link
10.192.6.145 dev cali42fe1db519e scope link
10.192.6.153 dev cali2b450175527 scope link
10.192.6.192/26 via 10.32.19.191 dev tunl0 proto bird onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
And the IP address of the veth link looks like this:
14: cali2b450175527@if4: mtu 1500 qdisc noqueue state UP group default
link/ether e6:d2:de:c8:5b:99 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::e4d2:deff:fec8:5b99/64 scope link
valid_lft forever preferred_lft forever
On the host, I am able to reach the host I am trying to ping from the container like this:
root@kube-worker-1:/# ping 10.32.19.50
PING 10.32.19.50 (10.32.19.50): 56 data bytes
64 bytes from 10.32.19.50: icmp_seq=0 ttl=63 time=0.472 ms
And the ip address and routing table in the container look like this:
root@nettools-deployment-428150981-rv6pt:/# ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if14: mtu 1500 qdisc noqueue state UP group default
link/ether 1e:f2:de:1d:1f:b1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.192.6.153/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::1cf2:deff:fe1d:1fb1/64 scope link
valid_lft forever preferred_lft forever
root@nettools-deployment-428150981-rv6pt:/# ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
Any ideas as to why I wouldn’t be getting a ping response? My calico.yaml is identical to yours except that i am using slightly newer calico containers and I have this additional config at the end:
—
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-policy-controller
namespace: kube-system
—
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-node
namespace: kube-system
Thanks,
Thomas