Kubernetes at DH: A journey from YAML headaches to Helm bliss

21.06.19 by Max Williams

Kubernetes at DH: A journey from YAML headaches to Helm bliss

Delivery Hero Logo

5 min read

You must have been living under a rock if you haven’t heard the Kubernetes (k8s) hype over the last few years. It has quickly become the platform of choice for running applications for many modern tech companies and Delivery Hero is one of them. What is unique about DH, is that some of our tech teams are very isolated, sometimes even on different continents, speaking different languages, with different applications trying to solve different problems.

While this broad diversity in technology can have its challenges, it has also allowed us to collectively try various approaches to solving common problems. In this article, I will share some of this collective experience and detail on how using a few tools can solve some common problems while running k8s at scale:

  1. Managing many k8s resources without headaches
  2. Managing k8s resources in a stateful way
  3. Preview changes to k8s resources before applying
  4. Dealing with sensitive data in k8s

When our teams first started using k8s, like most people, they started with a bunch of YAML files. While this is great in the beginning, once you have multiple clusters and many applications in many environments, simple YAML files can quickly become too difficult to manage, due to the number of files required. Consider what resources you might want for a common application: Deployment, Service, Ingress, ConfigMap, Secret, HorizontalPodAutoscaler and maybe a CronJob or two. With all these files, you will end up with a tonne of duplicated code and it can be tricky to keep values in sync across multiple files. In the beginning, some of our teams have solved this problem with some simple shell scripting or automation, such as using sed or envsubst to manipulate YAML files, or some more advanced templating solution like Sprig or Jinja. But fast paced k8s ecosystem has quickly converged on a single tool for this…

Helm

Helm charts allow you to package your application in a common way with text templating and some basic programmatic functions such as booleans and loops. Almost every team in DH is now packaging their applications in Helm Charts. And with the public charts repository, Helm is also the go to solution for installing cluster level tools, such as cluster-autoscaler, nginx-ingress, metrics-server and many more. DH engineers have even created some of these public charts, for example the cluster-overprovisioner.
Once you become a convert to the religion of Helm, some problems become apparent. What charts are installed on what clusters? What version of the Prometheus chart is running on the staging cluster? How can I install and upgrade multiple charts together and programmatically? This is where the second tool comes in…

Helmfile

Helmfile lets you specify your charts in a file, with Helm values files, associate them with a specific kubectl cluster context and then allows you to sync them to the cluster in a single command. Here’s an example for some applications:

context: eks_staging

releases:
- chart: charts/apps/event-processor
  name: staging-event-processor
  values:
  - chart/sapps/event-processor/values/staging/values.yaml

- chart: charts/apps/delivery-manager
  name: staging-delivery-manager
  values:
  - chart/sapps/delivery-manager/values/staging/values.yaml

Then we can sync these charts to the cluster eks_staging:

helmfile --file helmfiles/staging.yaml sync

Helmfile is also a great tool to bootstrap a fresh cluster after creation. For example, we have a standard set of charts we install on every cluster: cluster-autoscaler, fluentd, nginx-ingress, metrics-server, external-dns, oauth2-proxy, prometheus, cluster-overprovisioner and node-problem-detector. We specify these charts and pin them to specific versions in a separate Helmfile and place this file in git. This allows us to track and upgrade charts across multiple clusters in a safe way, for example testing newer chart versions on non-live clusters first.

So now we have our applications neatly packaged into charts and we manage the charts in a stateful way using Helmfile, but how can we be sure what’s going to happen when syncing our charts? Helm just sends the data to the k8s API and often it’s not clear what is going to happen. Here we use a Helm plugin…

helm-diff

helm-diff gives you a colour coded diff output of the changes Helm will apply. Very handy! This allows your workflow to be a little more like Terraform, another tool we are using every day here at DH. Here’s an example showing a change in a configmap for nginx-ingress:

$ helmfile --selector name=ingress01 --file helmfiles/staging/infra.yaml diff
exec: helm diff upgrade --allow-unreleased ingress01 stable/nginx-ingress --version 1.0.1 --values nginx-ingress/staging/values.yaml --kube-context eks_staging
default, ingress01-nginx-ingress-controller, ConfigMap (v1) has changed:
  # Source: nginx-ingress/templates/controller-configmap.yaml
  apiVersion: v1
  kind: ConfigMap
  metadata:
    labels:
      app: nginx-ingress
      chart: nginx-ingress-1.0.1
      component: "controller"
      heritage: Tiller
      release: ingress01
    name: ingress01-nginx-ingress-controller
  data:
    enable-vts-status: "true"
    log-format-escape-json: "true"
    proxy-next-upstream: error timeout http_502
-   proxy-next-upstream-tries: "3"
+   proxy-next-upstream-tries: "2"
    use-geoip: "true

So no we now have a nice workflow for dealing with k8s resources, but how can we handle sensitive data in our Helm charts? Like API keys and credentials?

helm-secrets

The 4th and final tool we will look here is also a Helm plugin: helm-secrets

How many times have you seen API keys or passwords in clear text in git? Or used some painful custom encryption tool based on GPG or similar? Or had to separate sensitive data entirely from Helm charts? There are many different tools to solve these types of problems, but few I have seen that integrate so well with the workflow covered in this article. The helm-secrets plugin integrates with Helmfile so encrypted chart values files are seamlessly decrypted locally using a key from AWS/GCP KMS service, applied to the cluster using Helm and then deleted. This has some significant benefits:

  • The encrypted values files can be safely stored in git
  • The encrypted values files are not stored unencrypted at all
  • Authentication to your cloud provider is required for decryption
  • The YAML keys in the values file are left unencrypted so pull request diffs still make some sense while still obfuscating the sensitive values
  • Encrypted values in the Helm chart can be used like any other key/value

Here is an updated Helmfile entry from before, but with an encrypted values file added:

context: eks_staging

releases:
- chart: charts/apps/event-processor
  name: staging-event-processor
  values:
  - charts/apps/event-processor/values/staging/values.yaml
  secrets:
  - charts/apps/event-processor/values/staging/secrets.yaml

And the secrets.yaml file might look like this on disk with some of the meta-data removed:

secrets:
    API_KEY: ENC[AES256_GCM,data:xxxxxxxxx=,tag:xxxxxx==,type:str]

You will also need to add a .sops.yaml file somewhere so helm-secrets knows what KMS key to use. Once this is in place and you have access to the KMS key in your cloud provider, you can simply sync the chart in the same way as normal and the values in secrets.yaml will be decrypted seamlessly.

Conclusion

The benefits of k8s can be quickly outweighed by management overhead if you are not careful. But by using some simple tools we can make life easier, changes safer and data more secure.

If you enjoyed this article and enjoy working with Kubernetes, then perhaps you would be interested in the following role, where you would be using Kubernetes on a daily basis to run our discovery and recommendation system.

We also have a variety of other exciting positions, feel free to check them out here.

Kubernetes at DH: A journey from YAML headaches to Helm bliss
Max Williams
Senior Principal Systems Engineer
DevOps for Machine Learning – What does this mean, and why do you need it?

Next

Infrastructure

DevOps for Machine Learning – What does this mean, and why do you need it?

Delivery Hero Logo
5 min read