Kubernetes 101 – External access into the cluster

In our last post, we looked at how Kubernetes handles the bulk of it’s networking.  What we didn’t cover yet, was how to access services deployed in the Kubernetes cluster from outside the cluster.  Obviously services that live in pods can be accessed directly as each pod has its own routable IP address.  But what if we want something a little more dynamic?  What if we used a replication controller to scale our web front end? We have the Kubernetes service, but what I would call its VIP range (Portal Net) isn’t routable on the network.  There are a couple of ways to solve this problem.  Let’s walk through the problem and talk about a couple of ways to solve it.  I’ll demonstrate the way I chose to solve it but that doesn’t imply that there aren’t other better ways as well.

As we’ve seen, Kubernetes has a built-in load balancer which it refers to as a service.  A service is group of pods that all provide the same function.  Services are accessible by other pods through an IP address which is allocated out of the clusters portal net allocation.  This all works out rather well for the pods but only because the way that we get traffic to the service.  Recall from our earlier posts that we need to use some iptables (netfilter) tricks to get traffic to the load balancing mechanism.  That mechanism is the Kubernetes proxy service and it lives on each and every Kubernetes node.  That being said, the concept of services only works for device that need to go through the Kubernetes proxy.  This would only include devices that are in the cluster. 

So to solve for external access into the cluster it seems reasonable that we’d need a different type of ‘external’ load balancer.  But what would an external load balancer use as the back end pool members?  The pod IP addresses are routable, so we could just load balance directly to the pod IP addresses.  However, pods can tend to be ephemeral in nature.  We would be constantly updating the load balancer pools as new pods came and left for a particular service group.  On the flip side, if we could leverage the service construct, the cluster would take care of finding the pods for us.  Luckily for us, there’s a way to do this.  Let’s look at a quick sample service definition…

id: "webfrontend"
kind: "Service"
apiVersion: "v1beta1"
port: 9090
containerPort: 80
PublicIPs: [10.20.30.62,10.20.30.63,192.168.10.64,192.168.10.65]
selector:
  name: "web80"
labels:
  name: "webservice"

Notice we added a new variable called ‘PublicIPs’.  By defining a public IP address, we’re telling the cluster that we want this service reachable on these particular IP addresses.  You’ll also note that the public IP’s I define are the IP addresses of the physical Kubernetes nodes.  This is a matter of simplicity, I could assign any IP address I wanted as a ‘public IP’ so long as the network knew how to get it to the Kubernetes cluster. 

Note: This works way different if you are running this on GCE.  There you can just tell the service to provision an external load balancer and the magic of Google’s awesome network does it for you.  Recall that in these posts I’m dealing with a bare metal lab.

So but what does this really do?  Let’s deploy it to the cluster and then check one of our Kubernetes nodes netfilter rules by dumping the rules with ‘iptables-save’…

Note: Im not going to step through all the commands I use to build this lab.  If you don’t know how to deploy constructs into Kubernetes go back and read this post.

image

 image 
Normally we’d expect to see a rule in each block just referencing the service IP address of 10.100.49.241.  In this case, we see 4 more rules in each block, one for each public IP we defined in the service.  Notice how the rule is exactly the same with the exception of the destination IP address.  This tells us that the node will handle traffic destined to the these 4 new IP addresses in the exact same manner that it handles traffic for the service IP.  So that’s awesome!  However, it’s only so awesome.  This type of setup means that each Kubernetes node can handle requests on port 9090 for the web80 service.  But how do we handle that from a user perspective?  We don’t want a user going right to a Kubernetes node since the nodes themselves should be considered ephemeral.  So we need another abstraction layer here to make this seamless to the end user. 

This is where the external load balancer kicks in.  In my case, I chose to use HAproxy since they have an available docker container pre-built on docker hub.  Let’s run through the config and then I’ll circle back and talk about some specifics.  I need a docker host to run the HAproxy container on and since I want it to be ‘outside’ the Kubernetes cluster (AKA, doesn’t have the kube-proxy service on it) I chose to just use the kubmasta host.  The first thing we need is a workable HAproxy config file.  The one I generated looks like this…

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    user haproxy
    group haproxy

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    option forwardfor
    option http-server-close
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http
    stats enable
    stats auth user:kubernetes
    stats uri /haproxyStats

frontend all
    bind *:80
    #Define the host we're looking for
    acl host_web80 hdr(host) -i web80.interubernet.local
    acl host_web8080 hdr(host) -i web8080.interubernet.local
    #Decide what backend pool to use for each host
    use_backend webservice80 if host_web80
    use_backend webservice8080 if host_web8080

backend webservice80
    balance roundrobin
    option httpclose
    option forwardfor
    server kubminion1 10.20.30.62:9090 check
    server kubminion2 10.20.30.63:9090 check
    server kubminion3 192.168.10.64:9090 check
    server kubminion4 192.168.10.65:9090 check
    option httpchk HEAD /index.html HTTP/1.0

backend webservice8080
    balance roundrobin
    option httpclose
    option forwardfor
    server kubminion1 10.20.30.62:9091 check
    server kubminion2 10.20.30.63:9091 check
    server kubminion3 192.168.10.64:9091 check
    server kubminion4 192.168.10.65:9091 check
    option httpchk HEAD /index.html HTTP/1.0

There’s quite a bit to digest here and if you haven’t used HAproxy before this can be a little confusing.  Let’s hit on the big items.  Under the default section I enable the statistics page and tell HAproxy at which URI it should be accessible at.  I also define a username and password for authentication to that page. 

The next section defines a single frontend and several backend pools.  The frontend binds the service to all (*) interfaces on port 80.  This is important and something that I didn’t think about until I had a ‘duh, Im running this in a container’ moment.  I initially tried to bind the service to a specific IP address.  This doesn’t work.  Since the host is running a default docker network configuration the IP assigned to the container is random and we’re going to have to use port mappings to get traffic into the container.  So assigning all available interfaces is really your only option (there are other options, this is just the easiest).  The remainder of the frontend section defines 2 rules for load balancing.  In the first section, I define host_web80 to be equal to any request that is destined to ‘web80.interubernet.local’ and host_web8080 to be equal to any request destined to ‘web8080.interubernet.local’.  This also implies that I have DNS records that look like this…

A – web80.interubernet.local – 10.20.30.61
A – web8080.interubernet.local – 10.20.30.61

Recall I mentioned that I’m using the default docker network configuration so when I run the container I’ll map the ports I need to ports on the hosts (kubmasta (10.20.30.61)) physical interface. 

The backend sections define the pools.  As you can see, I define each Kubernetes minion as well as a specific health check for each server.  The default ‘check’ HAproxy uses is just a layer4 port probe to the backend host.  While this will ensure that the host is up and talking to the Kubernetes cluster, it does NOT ensure that the pods we want to talk to are actually running.  Recall that when we define a service the rules get pushed to all of the hosts and Kubernetes starts searching for pods with a label that match the services label selector.  If there are no available pods, the L4 port check will still succeed.  That being said, we define a layer 7 health check that verifies that the the index.html file exists. 

The HAproxy container can pull a custom configuration into the container by mapping a volume.  In my case, I created a folder (/root/haproxy-config/) and put my configuration file (haproxy.cfg) in that folder.  Then to run the container I used this command…

docker run -d -p 80:80 -v ~/haproxy-config:/haproxy-override dockerfile/haproxy

In addition to mapping the volume, I also map port 80 on the host (10.20.30.61) to port 80 on the container.  Once your host downloads the image, you should be able to verify it’s running and that port 80 has been mapped…

image 
So now that we have the HAproxy configuration in place, let’s define the service we listed above as well as a second service that looks like this…

id: "webfrontend2"
kind: "Service"
apiVersion: "v1beta1"
port: 9091
containerPort: 8080
PublicIPs: [10.20.30.62,10.20.30.63,192.168.10.64,192.168.10.65]
selector:
  name: "web8080"
labels:
  name: "webservice"

Once both services are defined, the next step is to define the replication controllers for the backend pods.  Before we do that, let’s take a quick look at the HAproxy dashboard to see what it sees…

image 
So it looks like it doesn’t see any of the backend pools as being available.  We haven’t deployed any of our pods yet so this is normal.  Keep in mind that if we didn’t do the HTTP check on the backends this would show as up since the Kubernetes service is currently in place on the cluster.  We’ll define the backend pools through a Kubernetes replication controller.  Let’s start with the first one…

id: web-controller
apiVersion: v1beta1
kind: ReplicationController
desiredState:
  replicas: 1
  replicaSelector:
    name: web80
  podTemplate:
    desiredState:
      manifest:
        version: v1beta1
        id: webpod
        containers:
          - name: webpod
            image: jonlangemak/docker:web_container_80
            ports:
              - containerPort: 80
    labels:
      name: web80

This shouldn’t look new.  Nothing special here, except for the fact that we’re only deploying a single replica.  Let’s deploy it to the cluster and then check back with HAproxy…

image 
We can now see that the entire cluster is online.  Are you confused as to why it shows the service up on all 4 nodes when there should only be a single pod running on one of them?  Remember, we’re still using services so the pod appears to be present on all 4 nodes.  Let’s look at diagram to show you what I mean…

image
In the case of the health checks, the probes come from the HAproxy container and are sent to what to what it believes to be the backend servers that will service the requests.  That’s not really the case.  What we really have is two layers of load balancing.  The request comes in from HAproxy on port 9090 which is the port we defined in our service (red line).  The Kubernetes host receives the traffic, netfilter catches the traffic and sends the traffic on an assigned random port to the Kubernetes proxy service (orange line).  The Kubernetes proxy knows that there’s currently only one pod matching it’s label selector so it sends the traffic directly to that pod on port 80.  This causes HAproxy to think that all 4 Kubernetes nodes are hosting the service it’s looking for when it’s really only running on kubminion1.

Now let’s create our second replication controller for the web8080 traffic…

id: web-controller-2
apiVersion: v1beta1
kind: ReplicationController
desiredState:
  replicas: 2
  replicaSelector:
    name: web8080
  podTemplate:
    desiredState:
      manifest:
        version: v1beta1
        id: webpod
        containers:
          - name: webpod
            image: jonlangemak/docker:web_container_8080
            ports:
              - containerPort: 8080
    labels:
      name: web8080

Once we deploy this controller to the cluster we can check HAproxy again…

image
Now HAproxy believes that both backends are up.  Let’s do a couple quick tests to verify things are working as we expect.  Recall that we need to use the DNS names for this work since that’s how we’re mapping traffic to the correct backend pool…

image

image
So things seem to be working as expected.  Let’s wrap up by showing what the flow would look like for each of these user connections on our diagram…

image 
So as you can see, what’s really happening isn’t super straight forward.  Adding a second layer of load balancing with HAproxy certainly makes this a little more confusing, but it also gives you a lot of resiliency ad flexibility.  For instance, we can keep deploying services in this manner until we run out of ports.  The web80 service used port 9090 and web8080 used 9091.  I could very easily define another service in Kubernetes on port 9092 (or any other port) and then create a new HAproxy frontend rule and associated backend pool. 

Like I said, this is just one way to do it and I’m not even convinced that it’s the ‘right’ way to solve this.  One of the key benefits of Kubernetes is that it reduces the need for excessive port mapping.  Using HAproxy in this manner reintroduces some port madness.  However, I think it’s rather minimal and much easier to scale than when you’re doing port mapping at the host level.

I’m anxious to hear how other solved this and if anyone has nay feedback.  Thanks for reading!

13 thoughts on “Kubernetes 101 – External access into the cluster

  1. Ufuk Altinok

    This is a great 101, however having two load balancers doesn’t seems to be a solid solution IMO. Kubernetes moving really fast and it isn’t easy to catch up the speed. I’m currently thinking about a solution involving an api load balancer such as vulcand working with kube api server? What do you think about that?

    Reply
    1. Jon Langemak Post author

      For sure. Thats totally the direction we want to go. vulcand is a great direction but there’s no reason we couldnt do something similiar with our current configuration. Leveraging a tool like confd we could automate the provisioning of the HAproxy configuration.

      I totally agree that doing this manually isnt ‘web scale’. But if you’re looking for a way to do this in a small environment or test bed this is a good start. Im not sure of any other ways to get around the 2 load balancer thing. I mean, we want to leverage the Kubernetes constructs like services for pod discovery.

      Do you have any ideas on how to do this without services and without sending traffic right to a pod?

      Thanks!

      Reply
  2. Maple Wang

    Hi Jon,

    Thanks for your post, it’s very practicable for my “wet hand” trial. But there are still some questions for you:

    1 Kubernetes always claim it provide the primitive load balancer function, but I don’t know how to use it, what I know is either using GCE or replying on other external load balancer like HAProxy in your post, where is kuberbnetes’s orignal load balancer?

    2 In your example, IP+PORT is bind in the backend of HAProxy, it means that once a new service is created, the configure of HAProxy need to be updated, it’s obviously not flexible. I’m not familiar with HAProxy, is there way to only bind ip address in the backend with port range? for exmaple, binding four minions with port range 1000~2000 in the backend, when user visit 10.20.30.61:1001(master/HAProxy), then request will be forwarded to port 1001 in one of four backend minions.

    best regards

    Reply
  3. Mark Betz

    This is an absolutely phenomenal series of posts. Thanks so much for the effort that obviously went into the setup, the experiments and the writing. I’ve been using Docker for nearly two years and I don’t think I fully understood the networking model until I read these through from the beginning. We’ve been using kubernetes on GCE for about a month now, and I definitely did not understand the networking model there, so again, thank you. I was struggling not so much with the tools available to permit ingress from the outside world to our services, but rather with why they are what they are. You cleared it right up. I did have to read certain parts twice, but I don’t think that was your fault :).

    Reply
  4. Michael

    Jon, FANTASTIC WORK!!!! Really, I think most DEVs love K8s for its abstraction but leave the heavy lifting (HA, networking, security, auth, etc.) to the OPS guys, which usually are totally here. Your work really really helps a lot. Please continue with this series, esp. since K8s and Docker are moving fast in the networking space (docker network).

    Reply
  5. Roman

    Nice post, but I’m not convinced this would work in production. There is a problem right at the top: DNS name has to be integrated with load-balancer. So it actually resolved to correct set of IPs.
    So that one small green haproxy would be actually HA/balanced service.
    And then it becomes tricky to manage it automatically.

    Reply
  6. Christopher J. Ruwe

    Hi Jon,

    thanks for your series of posts which helped a lot to dive into Kubernetes operation.

    You can pull the IPs for the pods from etcd when you have defined a service over these pods. Taking the service name as key, you can pull like

    etcdctl get /registry/services/endpoints/default/.

    Taking confd to build a haproxy.cfg for instance, the template looks like

    {{$data := json (getv “/registry/services/endpoints/default/”)}}
    {{range $data.subsets}}
    {{range .addresses}}
    server {{.ip}} check
    {{end}}
    {{end}}

    Cheers and thanks againg

    Reply
    1. Christopher J. Ruwe

      Hi Jon,

      just saw that wordpress prunes the meta-variables I used.

      It is meant to be etcdctl get /registry/services/endpoints/default/THESERVICENAME

      and accordingly

      {{$data := json (getv “/registry/services/endpoints/default/THESERVICENAME”)}}
      {{range $data.subsets}}
      {{range .addresses}}
      server {{.ip}} check
      {{end}}
      {{end}}

      Reply
    2. Ron

      Hi Chris,

      I have configured 3 Kubernetes master with etcd configured. I have installed confd and haproxy. I am looking for some guidance on updating haproxy dynamically. Do you have any basic template?

      Reply
  7. Pingback: Kubernetes是什么-Wikipedia | 酷 壳 – CoolShell 3F

Leave a Reply to Christopher J. Ruwe Cancel reply

Your email address will not be published. Required fields are marked *