Getting started with Calico on Kubernetes

In the last 4 posts we’ve examined the fundamentals of Kubernetes networking…

Kubernetes networking 101 – Pods

Kubernetes networking 101 – Services

Kubernetes networking 101 – (Basic) External access into the cluster

Kubernetes Networking 101 – Ingress resources

My goal with these posts has been to focus on the primitives and to show how a Kubernetes cluster handles networking internally as well as how it interacts with the upstream or external network.  Now that we’ve seen that, I want to dig into a networking plugin for Kubernetes – Calico.  Calico is interesting to me as a network engineer because of wide variety of functionality that it offers.  To start with though, we’re going to focus on a basic installation.  To do that, I’ve updated my Ansible playbook for deploying Kubernetes to incorporate Calico.  The playbook can be found here.  If you’ve been following along up until this point, you have a couple of options.

  • Rebuild the cluster – I emphasized when we started all this that the cluster should be designed exclusively for testing.  Starting from scratch is always the best in my opinion if you’re looking to make sure you don’t have any lingering configuration.  To do that you can follow the steps here up until it asks you to deploy the KubeDNS pods.  You need to deploy Calico before deploying any pods to the cluster!
  • Download and rerun the playbook – This should work as well but I’d encourage you to delete all existing pods before doing this (even the ones in the Kube-System namespace!).  There are configuration changes that occur both on the master and the minion nodes so you’ll want to make sure that once the playbook is run that all the services have been restarted.  The playbook should do that for you but if you’re having issues check there first.

Regardless of which path you choose, I’m going to assume from this point on that you have a fresh Kubernetes cluster which was deployed using my Ansible role.  Using my Ansible role is not a requirement but it does some things for you which I’ll explain along the way so no worries if you aren’t using it.  The goal of this post is to talk about Calico, the lab being used is just a detail if you want to follow along.

So now that we have our lab sorted out – let’s talk about deploying Calico.  One of the nicest things about Calico is that it can be deployed through Kubernetes.  Awesome!  The recommended way to deploy it is to use the Calico manifest which they define over on their site under the Standard Hosted Installation directions.  If you’re using my Ansible role, a slightly edited version of this manifest can be found on your master in /var/lib/kubernetes/pod_defs.  Let’s take a look at what it defines…

# Calico Version v2.1.5
# http://docs.projectcalico.org/v2.1/releases#v2.1.5
# This manifest includes the following component versions:
#   calico/node:v1.1.3
#   calico/cni:v1.8.0
#   calico/kube-policy-controller:v0.5.4

# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Configure this with the location of your etcd cluster.
  etcd_endpoints: "https://ubuntu-1:2379"

  # Configure the Calico backend to use.
  calico_backend: "bird"

  # The CNI network configuration to install on each node.
  cni_network_config: |-
    {
        "name": "k8s-pod-network",
        "type": "calico",
        "etcd_endpoints": "__ETCD_ENDPOINTS__",
        "etcd_key_file": "__ETCD_KEY_FILE__",
        "etcd_cert_file": "__ETCD_CERT_FILE__",
        "etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
        "log_level": "info",
        "ipam": {
            "type": "calico-ipam"
        },
        "policy": {
            "type": "k8s",
            "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
            "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
        },
        "kubernetes": {
            "kubeconfig": "__KUBECONFIG_FILEPATH__"
        }
    }

  # If you're using TLS enabled etcd uncomment the following.
  # You must also populate the Secret below with these files.
  etcd_ca: "/calico-secrets/etcd-ca"
  etcd_cert: "/calico-secrets/etcd-cert"
  etcd_key: "/calico-secrets/etcd-key"

---

# The following contains k8s Secrets for use with a TLS enabled etcd cluster.
# For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: calico-etcd-secrets
  namespace: kube-system
data:
  # Populate the following files with etcd TLS configuration if desired, but leave blank if
  # not using TLS for etcd.
  # This self-hosted install expects three files with the following names.  The values
  # should be base64 encoded strings of the entire contents of each file.
  etcd-key: PUT YOUR BASE64 ENCODED KEY HERE!
  etcd-cert: PUT YOUR BASE64 ENCODED CERT HERE!
  etcd-ca:  PUT YOUR BASE64 ENCODED CA HERE!

---

# This manifest installs the calico/node container, as well
# as the Calico CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: calico-node
  namespace: kube-system
  labels:
    k8s-app: calico-node
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  template:
    metadata:
      labels:
        k8s-app: calico-node
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        scheduler.alpha.kubernetes.io/tolerations: |
          [{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
           {"key":"CriticalAddonsOnly", "operator":"Exists"}]
    spec:
      hostNetwork: true
      containers:
        # Runs calico/node container on each Kubernetes node.  This
        # container programs network policy and routes on each
        # host.
        - name: calico-node
          image: quay.io/calico/node:v1.1.3
          env:
            # The location of the Calico etcd cluster.
            - name: ETCD_ENDPOINTS
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_endpoints
            # Choose the backend to use.
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
            # Disable file logging so `kubectl logs` works.
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "true"
            # Set Felix endpoint to host default action to ACCEPT.
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            # Configure the IP Pool from which Pod IPs will be chosen.
            - name: CALICO_IPV4POOL_CIDR
              value: "10.100.0.0/16"
            - name: CALICO_IPV4POOL_IPIP
              value: "always"
            # Disable IPv6 on Kubernetes.
            - name: FELIX_IPV6SUPPORT
              value: "false"
            # Set Felix logging to "info"
            - name: FELIX_LOGSEVERITYSCREEN
              value: "info"
            # Location of the CA certificate for etcd.
            - name: ETCD_CA_CERT_FILE
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_ca
            # Location of the client key for etcd.
            - name: ETCD_KEY_FILE
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_key
            # Location of the client certificate for etcd.
            - name: ETCD_CERT_FILE
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_cert
            # Auto-detect the BGP IP address.
            - name: IP
              value: ""
          securityContext:
            privileged: true
          resources:
            requests:
              cpu: 250m
          volumeMounts:
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
            - mountPath: /var/run/calico
              name: var-run-calico
              readOnly: false
            - mountPath: /calico-secrets
              name: etcd-certs
        # This container installs the Calico CNI binaries
        # and CNI network config file on each node.
        - name: install-cni
          image: quay.io/calico/cni:v1.8.0
          command: ["/install-cni.sh"]
          env:
            # The location of the Calico etcd cluster.
            - name: ETCD_ENDPOINTS
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_endpoints
            # The CNI network config to install on each node.
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: cni_network_config
          volumeMounts:
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
            - mountPath: /calico-secrets
              name: etcd-certs
      volumes:
        # Used by calico/node.
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: var-run-calico
          hostPath:
            path: /var/run/calico
        # Used to install CNI.
        - name: cni-bin-dir
          hostPath:
            path: /opt/cni/bin
        - name: cni-net-dir
          hostPath:
            path: /etc/cni/net.d
        # Mount in the etcd TLS secrets.
        - name: etcd-certs
          secret:
            secretName: calico-etcd-secrets

---

# This manifest deploys the Calico policy controller on Kubernetes.
# See https://github.com/projectcalico/k8s-policy
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: calico-policy-controller
  namespace: kube-system
  labels:
    k8s-app: calico-policy
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ''
    scheduler.alpha.kubernetes.io/tolerations: |
      [{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
       {"key":"CriticalAddonsOnly", "operator":"Exists"}]
spec:
  # The policy controller can only have a single active instance.
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      name: calico-policy-controller
      namespace: kube-system
      labels:
        k8s-app: calico-policy
    spec:
      # The policy controller must run in the host network namespace so that
      # it isn't governed by policy that would prevent it from working.
      hostNetwork: true
      containers:
        - name: calico-policy-controller
          image: quay.io/calico/kube-policy-controller:v0.5.4
          env:
            # The location of the Calico etcd cluster.
            - name: ETCD_ENDPOINTS
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_endpoints
            # Location of the CA certificate for etcd.
            - name: ETCD_CA_CERT_FILE
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_ca
            # Location of the client key for etcd.
            - name: ETCD_KEY_FILE
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_key
            # Location of the client certificate for etcd.
            - name: ETCD_CERT_FILE
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: etcd_cert
            # The location of the Kubernetes API.  Use the default Kubernetes
            # service for API access.
            - name: K8S_API
              value: "https://kubernetes.default:443"
            # Since we're running in the host namespace and might not have KubeDNS
            # access, configure the container's /etc/hosts to resolve
            # kubernetes.default to the correct service clusterIP.
            - name: CONFIGURE_ETC_HOSTS
              value: "true"
          volumeMounts:
            # Mount in the etcd TLS secrets.
            - mountPath: /calico-secrets
              name: etcd-certs
      volumes:
        # Mount in the etcd TLS secrets.
        - name: etcd-certs
          secret:
            secretName: calico-etcd-secrets

That’s a lot so let’s walk through what the manifest defines. The first thing the manifest defines is a config-map that Calico uses to define high level parameters about the Calico installation. Calico relies on a ETCD key value store for some of it’s functions so this is where we define the location of that. In this case, I’m using the same one that I’m using for Kubernetes. Again – this is a lab – they don’t recommend you doing that in non-lab environments. So in my case, I point the etcd_endpoints parameter to the host Ubuntu-1 on port 2379. Since we’re using cert based auth for ETCD I also need to tell Calico where the certs are for that. To do that you just need to un-comment lines 46-48 in the config-map. Do not change these values assuming you need to point that at a real file location on the host!

The second item the manifest defines is a Kubernetes secret which we populate with the ETCD TLS information if we’re using it.  We are so we need to populate these fields (lines 46-48) with base 64 encoded versions of each of these items.  Again – this is something that Ansible will do for you if you use my role. If not, you need to manually insert the values (I removed them from the file just to save space). We haven’t talked about secrets specifically but they are a means to share secret information with objects inside the Kubernetes cluster.

The third item the manifest defines is a daemon-set. Dameon-sets are a means to deploy a specific workload to every Kubernetes node or minion. So say I had a logging system that I wanted on each system. Deploying it as a daemon-set allows Kubernetes to manage that for me. If I join a new node to the cluster, Kubernetes will start the logging system on that node as well. So in this case, the daemon-set is for Calico and consists of two containers. The node container is the brains of the operation and what does most of the heavy lifting. This is also where we changed the CALICO_IPV4POOL_CIDR parameter from the default to 10.100.0.0/16. This is not required but I wanted to keep the pod IP addresses in that subnet for my lab. The install-cni container takes care of creating the correct CNI definitions on each host so that Kubernetes can consume Calico through a CNI plugin. Once it completes this task it goes to sleep and never wakes back up. We’ll talk more about the CNI definitions below.

The fourth and final piece of the manifest defines the Calico policy controller. We wont be talking about that piece of Calico in this post so just hold tight on that one for now.

So let’s deploy this file to the cluster…

[email protected]:~$ cd /var/lib/kubernetes/pod_defs/
[email protected]:/var/lib/kubernetes/pod_defs$ kubectl apply -f calico.yaml
configmap "calico-config" created
secret "calico-etcd-secrets" created
daemonset "calico-node" created
deployment "calico-policy-controller" created
[email protected]:/var/lib/kubernetes/pod_defs$

Alright – now let’s run our net-test pod again so we have a testing point…

[email protected]:~$ kubectl run net-test --image=jonlangemak/net_tools
deployment "net-test" created
[email protected]:~$

Once running let’s check and see what the pod received for networking.

[email protected]:~$ kubectl exec net-test-645963977-thv8m -- ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
2: [email protected]:  mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0
    ipip remote any local any ttl inherit nopmtudisc
4: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 7e:7e:2f:f3:bb:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0
    veth addrgenmode eui64
[email protected]:~$
user[email protected]:~$ kubectl exec net-test-645963977-thv8m ifconfig
eth0      Link encap:Ethernet  HWaddr 7e:7e:2f:f3:bb:30
          inet addr:10.100.163.129  Bcast:0.0.0.0  Mask:255.255.255.255
          inet6 addr: fe80::7c7e:2fff:fef3:bb30/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1908 (1.9 KB)  TX bytes:1908 (1.9 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

[email protected]:~$

First we notice that the eth0 interface is actually a VETH pair. We see that it’s peer is interface index 5 which on the host is an interface called [email protected]. So the container’s network namespace is connected back to the host using a VETH pair. This is very similar to how most container networking solutions work with one minor change. The host side VETH pair is not connected to a bridge. It just lives by itself in the default or root namespace. We’ll talk more about the implications of this later on in this post. Next we notice that the pod received an IP address of 10.100.163.129. This doesn’t seem unusual since that was our pod CIDR we had defined in previous labs, but if you look at the kube-controller-manager service definition. You’l notice that we no longer configure that option…

[email protected]:~$ more /etc/systemd/system/kube-controller-manager.service
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/usr/bin/kube-controller-manager \
  --allocate-node-cidrs=false \
  --cluster-name=my_cluster \
  --leader-elect=true \
  --master=http://10.20.30.71:8080 \
  --root-ca-file=/var/lib/kube_certs/ca.pem \
  --service-account-private-key-file=/var/lib/kube_certs/kubernetes-key.pem \
  --service-cluster-ip-range=10.11.12.0/24 \
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
[email protected]:~$

Notice that the --cluster-cidr parameter is missing entirely and that the --allocate-node-cidrs parameter has been changed to false. This means that Kubernetes is no longer allocating pod CIDR networks to the nodes. So how are the pods getting IP addresses now? The answer to that lies in the kubelet configuration…

[email protected]:~$ more /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
ExecStart=/usr/bin/kubelet \
  --allow-privileged=true \
  --api-servers=https://10.20.30.71:6443 \
  --cloud-provider= \
  --cluster-dns=10.11.12.254 \
  --cluster-domain=k8s.cluster.local \
  --container-runtime=docker \
  --docker=unix:///var/run/docker.sock \
  --network-plugin=cni \
  --kubeconfig=/var/lib/kubelet/kubeconfig \
  --reconcile-cidr=true \
  --serialize-image-pulls=false \
  --tls-cert-file=/var/lib/kube_certs/kubernetes.pem \
  --tls-private-key-file=/var/lib/kube_certs/kubernetes-key.pem \
  --v=2

Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
[email protected]:~$

Our --network-plugin change from kubenet to cni. This means that we’re using native CNI in order to provision container networking. When doing so, Kubernetes acts as follows…

The CNI plugin is selected by passing Kubelet the –network-plugin=cni command-line option. Kubelet reads a file from –cni-conf-dir (default /etc/cni/net.d) and uses the CNI configuration from that file to set up each pod’s network. The CNI configuration file must match the CNI specification, and any required CNI plugins referenced by the configuration must be present in –cni-bin-dir (default /opt/cni/bin).
If there are multiple CNI configuration files in the directory, the first one in lexicographic order of file name is used.
In addition to the CNI plugin specified by the configuration file, Kubernetes requires the standard CNI lo plugin, at minimum version 0.2.0

Since we didnt specify --cni-conf-dir or –-cni-bin-dir the kubelet will look in the default path for each.  So let’s checkout what’s in the --cni-conf-dir (/etc/cni/net.d) now…

[email protected]:~$ cd /etc/cni/net.d
[email protected]:/etc/cni/net.d$ ls
10-calico.conf  calico-kubeconfig  calico-tls
[email protected]:/etc/cni/net.d$
[email protected]:/etc/cni/net.d$ more 10-calico.conf
{
    "name": "k8s-pod-network",
    "type": "calico",
    "etcd_endpoints": "https://ubuntu-1:2379",
    "etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key",
    "etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert",
    "etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca",
    "log_level": "info",
    "ipam": {
        "type": "calico-ipam"
    },
    "policy": {
        "type": "k8s",
        "k8s_api_root": "https://10.11.12.1:443",
        "k8s_auth_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW
8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLXRsaGhzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2Ut
YWNjb3VudC51aWQiOiI3NTY4OTkwYy0zNDJjLTExZTctOGVjZS0wMDBjMjkzZTQ5NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.lP2R3p-RzOeo2h6kd2NvaTPcSppWXE4va2ZQEqqSx1y1UQBPvPuYBLx6X7EB20hkV
kO5UEHJEBWdMOU5C7xKUAYBD7iqTtOaFlrsMnpn99bL2rJJoJV-UCRttcLVschvsl1sn8cl8xCGvhm94NevD1j5IXioqTeeBCumuPfk5N8W9nN4xfHnrC8bF0HON8CAcLurrK2ZpE6Z9TFxA9rBy803UxotLYPhT01iSALhLP-u43JX5Jj_fky-onLf6rK3bh4DRRCugom
EFi5m6CRTklJQvKod1xfxf3DdV9J04cN3qV3Sh0snKM04omPObeJudhVCmpm5-ZevbhlN2LR2WQ"
    },
    "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
    }
}
[email protected]:/etc/cni/net.d$ more calico-kubeconfig
# Kubeconfig file for Calico CNI plugin.
apiVersion: v1
kind: Config
clusters:
- name: local
  cluster:
    insecure-skip-tls-verify: true
users:
- name: calico
contexts:
- name: calico-context
  context:
    cluster: local
    user: calico
current-context: calico-context
[email protected]:/etc/cni/net.d$

There’s quite a bit here and all of these files were written by Calico. Specifically by the install-cni container. We can verify that by checking it’s logs…

[email protected]:/etc/cni/net.d$ sudo docker logs d042855bb84a
Installing any TLS assets from /calico-secrets
Wrote Calico CNI binaries to /host/opt/cni/bin/
CNI plugin version: v1.8.0
Wrote CNI config: {
    "name": "k8s-pod-network",
    "type": "calico",
    "etcd_endpoints": "https://ubuntu-1:2379",
    "etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key",
    "etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert",
    "etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca",
    "log_level": "info",
    "ipam": {
        "type": "calico-ipam"
    },
    "policy": {
        "type": "k8s",
        "k8s_api_root": "https://10.11.12.1:443",
        "k8s_auth_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLW01M3FxIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIyZjY1YjU3OC0zNTliLTExZTctODgwYS0wMDBjMjkzZTQ5NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.kowTGr92p6ICzHX48-bJu-0F6Re81J1Eq-kmQLZFYcXhIDaOCbKDtp-Jtc8LdQxd87zKyiqbnh8UCBP60hpxYBI8J90WGrwNqqmr6XXmIWNXfwIUyCo-XmUmxtFLD671eat4z8ya1nD4TeMK6Zu8rIsByRa6hpK_uUQdW7TES-DnOzwCV3Ogj_hZ67rDF4KEWj98SFq50osVYI6Mf8NYksSn-0iiygY3oFS8Ir82cUzGfLNx6sAkULOnUF5o31CYSClOILH_vPd3HFIIgy4X503VZpbEPRjF8UuWFYoDIhQePOtYrMjq9jlOqCP1NBhGhvlQ8N0P2ndS2DVt8oHuZQ"
    },
    "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
    }
}
Done configuring CNI.  Sleep=true

As we can see from the log of the container on each host, the CNI container created the binaries if they didnt exist (these may have already existed if you were using the previous lab build). It then created the CNI policy and the associated kubeconfig file for CNI to use. It also created the /etc/cni/net.d/calico-tls directory and placed the certs required to talk to etcd in that directory. It got this information from the Kubernetes secret /calico-secrets which is really the information from the secret calico-etcd-secrets that we created in the Calico manifest. The secret just happens to be mounted into the container as calico-secrets. The CNI definition also specifies that a plugin of calico should be use which we’ll find does exist in the /opt/cni/bin directory. it also specifies an IPAM plugin of calico-ipam meaning that calico is also taking care of our IP address assignment. One other interesting thing to point out is that the CNI definition lists the information required to talk to the Kubernetes API. To do this, it’s using the default pod token.  If you’re curious how the pods get the token to talk to the API server check out this piece of documentation that talks about default service accounts and credentials in Kubernetes.  Lastly – the install-CNI container created a kubeconfig file which specifies some further Kubernetes connectivity parameters.

So running the Calico manifest did quite a lot for us.  Each node node has the Calico CNI plugins and the means to talk to the Kubernetes API.  So now we know that Calico is driving the IP address allocation for the hosts, what about the actual networking side of things?  Let’s take a closer look at the routing for net-test container…

[email protected]:~$ kubectl exec -it net-test-645963977-thv8m ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0  scope link
[email protected]:~$

Well this is strange. The default route is pointing at 169.254.1.1. Let’s look on the host this container is running on and see what interfaces exist…

[email protected]:~$ ifconfig
cali182f84bfeba Link encap:Ethernet  HWaddr 2e:7e:32:de:8c:a3
          inet6 addr: fe80::2c7e:32ff:fede:8ca3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:648 (648.0 B)  TX bytes:648 (648.0 B)

docker0   Link encap:Ethernet  HWaddr 02:42:17:5f:98:a6
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ens32     Link encap:Ethernet  HWaddr 00:0c:29:78:24:28
          inet addr:10.20.30.73  Bcast:10.20.30.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe78:2428/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1725392 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1442228 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:759641266 (759.6 MB)  TX bytes:133830278 (133.8 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:186 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:13652 (13.6 KB)  TX bytes:13652 (13.6 KB)

tunl0     Link encap:IPIP Tunnel  HWaddr
          inet addr:10.100.163.128  Mask:255.255.255.255
          UP RUNNING NOARP  MTU:1440  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

[email protected]:~$

Nothing matching that IP address here. So what’s going on? How can a container route at an IP that doesnt exist? Let’s walk through what’s happening. Some of you reading this might have noticed that 169.254.1.1 is an IPv4 link local address.  The container has a default route pointing at a link local address meaning that the container expects this IP address to be reachable on it’s directly connected interface, in this case, the containers eth0 address. The container will attempt to ARP for that IP address when it wants to route out through the default route. Since our container hasnt talked to anything yet, we have the opportunity to attempt to capture it’s ARP request on the host. Let’s setup a TCPDUMP on the host ubuntu-3 and then use kubectl exec on the master to try talking to the outside world…

[email protected]:~$ kubectl exec -it net-test-645963977-thv8m -- ping 4.2.2.2 -c 1
PING 4.2.2.2 (4.2.2.2): 56 data bytes
64 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=28.979 ms
--- 4.2.2.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 28.979/28.979/28.979/0.000 ms
[email protected]:~$

[email protected]:~$ sudo tcpdump -i cali182f84bfeba -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali182f84bfeba, link-type EN10MB (Ethernet), capture size 262144 bytes
11:07:37.523892 ARP, Request who-has 169.254.1.1 tell 10.100.163.129, length 28
11:07:37.523917 ARP, Reply 169.254.1.1 is-at 2e:7e:32:de:8c:a3, length 28
11:07:37.523937 IP 10.100.163.129 > 4.2.2.2: ICMP echo request, id 230, seq 0, length 64
11:07:37.551884 IP 4.2.2.2 > 10.100.163.129: ICMP echo reply, id 230, seq 0, length 64
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel
[email protected]:~$

In the top output you can see we have the container send a single ping to 4.2.2.2. This will surely follow the container’s default route and cause it to ARP for it’s gateway at 169.254.1.1. In the bottom output you see the capture on the host Ubuntu-3. Notice we did the capture on the interface cali182f84bfeba which is the host side of the VETH pair connecting the container back to the root or default network namespace on the host. In the output of the TCPDUMP we see the container with a source of 10.100.163.129 send an ARP request. The reply comes from 2e:7e:32:de:8c:a3 which, if we reference the above output, will see is the MAC address of the host side VETH pair cali182f84bfeba. So you might be wondering how on earth the host is replying to an ARP request for which it doesn’t have an IP interface on. The answer is proxy-arp. If we check the host side VETH interface we’ll see that proxy-arp is enabled…

[email protected]:~$ cat /proc/sys/net/ipv4/conf/cali182f84bfeba/proxy_arp
1
[email protected]:~$

By enabling proxy-arp on this interface Calico is instructing the host to reply to the ARP request on behalf of someone else that is, through proxy. The rules for proxy-ARP are simple. A host which has proxy-ARP enabled will reply to ARP requests with it’s own MAC address when…

  • The host receives an ARP request on an interface which has proxy-ARP enabled.
  • The host knows how to reach the destination
  • The interface the host would use to reach the destination is not the same one that it received the ARP request on

So in this case, the container is sending an ARP request for 169.254.1.1.  Despite this being a link-local address, the host would attempt to route this following it’s default route out the hosts physical interface.  This means we’ve met all three requirements so the host will reply to the ARP request with it’s MAC address.

Note: If you’re curious about these requirements go ahead and try them out yourself.  For requirement 1 you can disable proxy-arp on the interface with echo 0 > /proc/sys/net/ipv4/conf/<interface name goes here>/proxy_arp.  For requirement 2 simply remove the hosts default route (make sure you have a 10’s route or some other means to reach the host before you do that!) like so sudo ip route del 0.0.0.0/0.  For the third requirement point the route 169.254.0.0/16 at the VETH interface itself like this sudo ip route add 169.254.0.0/16 dev <Calico VETH interface name>.  If you do any of these, the container will no longer be able to access the outside world.  Part of me wonders if this makes it a bit fragile but I also assume that most hosts will have a default route.  

The ARP process for the container would look like this…

In this case, the proxy ARP requirements are met since the host has a default route it can follow for the destination of 169.254.1.1 so it replies to the container with it’s own MAC address.  At this point, the container believes it has a valid ARP entry for it’s default gateway and will start initiating normal traffic toward the host.  It’s a pretty clever configuration but one that takes some time to understand.

I had mentioned above that the host side of the container VETH pair just lived in the hosts default or root namespace. In other container implementations, this interface would be attached to a common bridge so that all connected containers could talk to one another directly. In that scenario, the bridge would commonly be allocated an IP address giving the host an IP address on the same subnet as the containers.  This would allow the host to talk (do things like ARP) to the container directly. Having the bridge also allows the containers themselves to talk directly to one another through the bridge. This describes a layer 2 scenario where the host and all containers attached to the bridge can ARP for each others IP addresses directly. Since we don’t have the bridge, we need to tell the host how to route to each container. If we look at the hosts routing table we’ll see that we have a /32 route for the IP of the our net-test container…

[email protected]:~$ ip route
default via 10.20.30.1 dev ens32 onlink
10.20.30.0/24 dev ens32  proto kernel  scope link  src 10.20.30.73
10.100.5.192/26 via 192.168.50.74 dev tunl0  proto bird onlink
10.100.138.192/26 via 192.168.50.75 dev tunl0  proto bird onlink
blackhole 10.100.163.128/26  proto bird
10.100.163.129 dev cali182f84bfeba  scope link
10.100.243.0/26 via 10.20.30.72 dev tunl0  proto bird onlink
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown
[email protected]:~$

The route points the IP address at the host side VETH pair. We also notice some other unusual routes in the hosts routing table…

[email protected]:~$ ip route
default via 10.20.30.1 dev ens32 onlink
10.20.30.0/24 dev ens32  proto kernel  scope link  src 10.20.30.73
10.100.5.192/26 via 192.168.50.74 dev tunl0  proto bird onlink
10.100.138.192/26 via 192.168.50.75 dev tunl0  proto bird onlink
blackhole 10.100.163.128/26  proto bird
10.100.163.129 dev cali182f84bfeba  scope link
10.100.243.0/26 via 10.20.30.72 dev tunl0  proto bird onlink
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown
[email protected]:~$

These routes are inserted by Calico and represent the subnets allocated by Calico to all of the other hosts in our Kubernetes cluster. We can see that Calico is allocating a /26 network to each host…

10.100.243.0/26 – Ubuntu-2
10.100.163.128/26 – Ubuntu-3
10.100.5.192/26 – Ubuntu-4
10.100.138.192/26 – Ubuntu-5

Notice that these destinations are reachable through the tunl0 interface which is Calico’s IPIP overlay transport tunnel. This means that we don’t need to tell the upstream or physical network how to get to each POD CIDR range since it’s being done in the overlay. This also means that we can no longer reach the pod IP address directly. This conforms more closely with what the Kubernetes documentation describes when it says that the pod networks are not routable externally.  In our previous examples they were reachable since we were manually routing the subnets to each host.

We’ve just barely scratched the surface of Calico in this post but it should be enough to get you and running. In the next post we’ll talk about how Calico shares routing and reachability information between the hosts.

7 thoughts on “Getting started with Calico on Kubernetes

  1. Frederic LOUI

    Hi, very enlightening article. I did deploy manually calico following the documentation on a simple cluster (1 master and 1 worker) and the strange issue I have is that when I fire up 2 net-test image, they can’t ping each other. Inter-pod communication om the same worker seems to be broken. In term of control plane I’ve both cali created and both proxy-arp has been enabled on these interface, also have the 2 routes pointing to dev scope link. My worker has multiple interface. I tried 4.2.2.2 but could not get any echo reply. Seems that this is a forwardign issue. Any idea/Hint where this issue should comes from. I suspect an iptables issues. (I have similar issue with kubenet and need to manually add chain on (cbr0: sudo iptables -A FORWARD -i cbr0 -j ACCEPT sudo iptables -A FORWARD -o cbr0 -j ACCEPT) would it be possible to dump your iptables ? Thanks anyway for ALL of these articles.

    Reply
    1. Jon Langemak Post author

      Sorry for the delay in replying. What do the route tables look like on the pods and on the host itself?

      Reply
      1. Frederic LOUI

        Thanks for your reply. I found the issue to this problem. Actually when you do a manual install (in order to understand the automatic installation) Everything is block by default by iptables because of the default drop network policy. In the documentation it is explicitely mentioned that all authorized traffic traffic must be explicitely allowed but this is unclear “how” to to do. At least for a calico newcomer. Once applied a fault profile per namespace (kube-system, default namespace) everything is back up up. In order to see if there is an iptables issues just issue iptbles-save and analyse packet “jump” sequences. I have an other issue now. I tweaked my installation so much that I don’t know why /26 routes and blackholes routes are not installed by bird container. Any idea where to look at this or how to troublshoot this. Calico log are not meaningful enough. (at least from my perspective) Thanks ! Bgrds, Frederic

        Reply
        1. Frederic LOUI

          Found the issue. Problem was that my minions had multiple interface and FELIX resolve using gethostbyname while mu calico-node was registered using IP adress of the interface. Setting FELIX_FELIXHOSTNAME solved the issue. When you have such behaviour just “recursively ls” etcd. Cheers, Frederic

          Reply
  2. Thomas Peters-Hall

    I have tried to replicate your setup and am having trouble receiving ping responses. I believe I meet the three proxy arp requirements, but my traffic looks like this:

    [email protected]:/# tcpdump -i cali2b450175527 -nn
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on cali2b450175527, link-type EN10MB (Ethernet), capture size 262144 bytes
    18:34:15.327511 IP 10.192.6.153 > 10.32.19.50: ICMP echo request, id 95, seq 0, length 64
    18:34:20.754962 ARP, Request who-has 169.254.1.1 tell 10.192.6.153, length 28
    18:34:20.755006 ARP, Reply 169.254.1.1 is-at e6:d2:de:c8:5b:99, length 28

    The routing table on the host looks like this:

    default via 10.192.6.1 dev ens160 proto dhcp src 10.192.6.6 metric 1024
    10.192.6.0/25 dev ens160 proto kernel scope link src 10.192.6.6
    10.192.6.1 dev ens160 proto dhcp scope link src 10.192.6.6 metric 1024
    blackhole 10.192.6.128/26 proto bird
    10.192.6.142 dev califc066d97bf4 scope link
    10.192.6.144 dev cali14d8083e67e scope link
    10.192.6.145 dev cali42fe1db519e scope link
    10.192.6.153 dev cali2b450175527 scope link
    10.192.6.192/26 via 10.32.19.191 dev tunl0 proto bird onlink
    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

    And the IP address of the veth link looks like this:

    14: [email protected]: mtu 1500 qdisc noqueue state UP group default
    link/ether e6:d2:de:c8:5b:99 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::e4d2:deff:fec8:5b99/64 scope link
    valid_lft forever preferred_lft forever

    On the host, I am able to reach the host I am trying to ping from the container like this:

    [email protected]:/# ping 10.32.19.50
    PING 10.32.19.50 (10.32.19.50): 56 data bytes
    64 bytes from 10.32.19.50: icmp_seq=0 ttl=63 time=0.472 ms

    And the ip address and routing table in the container look like this:

    [email protected]:/# ip addr
    1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: [email protected]: mtu 1480 qdisc noop state DOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    4: [email protected]: mtu 1500 qdisc noqueue state UP group default
    link/ether 1e:f2:de:1d:1f:b1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.192.6.153/32 scope global eth0
    valid_lft forever preferred_lft forever
    inet6 fe80::1cf2:deff:fe1d:1fb1/64 scope link
    valid_lft forever preferred_lft forever
    [email protected]:/# ip route
    default via 169.254.1.1 dev eth0
    169.254.1.1 dev eth0 scope link

    Any ideas as to why I wouldn’t be getting a ping response? My calico.yaml is identical to yours except that i am using slightly newer calico containers and I have this additional config at the end:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: calico-policy-controller
    namespace: kube-system

    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: calico-node
    namespace: kube-system

    Thanks,

    Thomas

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *