In the last year, our Restaurant Partner Solutions (RPS) vertical faced a massive cultural change when our containers orchestration migrated from ECS to EKS. Currently, YAML+terraform is the language spoken in our tribe. Before that, we used to speak JSON+cloudformation and it would take roughly one week to create a new service. Since our migration to EKS, we are able to create the service in a matter of hours and our developers are more empowered as we promote the “you build it, you run it” ethos.
We are moving faster than ever: RPS spans five regions and is quite stable while processing more than one million orders a day, but what about security? This post is the first of a series where we will share the resources we are using to make sure security is a crucial aspect of all levels of our RPS tech.
The need for container security & compliance
Besides our five regional production clusters, we also have staging, load-test and infrastructure clusters, with many of these running images from different repositories and sources. How do we tackle security and compliance for those images, either an own built image with our apps or a core part of our Kubernetes ecosystem?
We are using anchore-engine, an open-source tool for scanning container images for known vulnerabilities, naming everything inside the container: packages, software dependencies, libraries, permissions and scanning file content for secrets. Then we can use it to verify against user-defined policies. For example, you can block containers that contain known vulnerabilities or that contain AWS access keys.
Anchore and anchore-engine OSS
Anchore (legacy) toolset was open-sourced in 2016 as an all-in-one solution shipped in a single docker container that provided scan results only via its command line.
Anchore launched anchore-engine in 2017 with the REST API and including support for the enterprise offer components such as UI, reporting and dashboards. As we are using the open-source anchore-engine alone, it lacks the enterprise component. We needed to integrate it with our team’s Slack notifications and developed an errbot plugin to interact with the API and deliver the results we needed. We are using the public charts for anchore-engine and anchore admission controller together with a lambda function and errbot plugin (we will touch on them later) to get it integrated into our workflow, as we can see in the following diagram.
Anchore policy bundles (structured as JSON documents) are the unit of policy definition and evaluation. It is possible to create multiple policy bundles, but for evaluation, we can only mark one as ‘active’. We express an anchore policy as a bundle that comprises a set of rules to evaluate an image. Those rules define checks against an image for the following aspects:
- Security vulnerabilities
- Package whitelists and blacklists
- Configuration file contents
- Presence of credentials in an image
- Image manifest changes
- Exposed ports or Dockerfile CMD
- Anchore policies return a pass or fail decision result.
To better gauge the power of Anchore’s policies, check out these examples of Docker CIS and DoD policy bundles, as well the complete list of gates.
The policy editor is only available in the Enterprise offer, so we will need manual JSON edit.
Kubernetes admission controllers are plugins that control how the cluster will handle authenticated requests and may change or deny the request based on its logic. There are two types of admission controllers and each type runs in a different phase of admission which is either Mutating or Validating. As you can probably guess, we will use the Anchore evaluation during the Validating phase to check the image before accepting it.
You can read more about the admission controller in the docs and check the list of plugins supported by each EKS version.
Anchore admission controller can run and enforce that the image passes the policy or is at least analyzed. This is an example notification of a policy evaluation that we have in Slack:
We created a lambda function to receive the Anchore webhook notification, format and message us on Slack for the following events:
- Analysis updated
- Vulnerability updated
- Policy Evaluation
- Tag updated
This is an example of a notification when a given image has its vulnerabilities updated:
Our bot currently supports 2 commands to interact with Anchore:
1. Search all images for a given vulnerability ID or an installed package. Anchore supports node npm, ruby gem, java and OS packages.
Search images for Vulnerability ID:
2. Search images for installed packages:
There is some ongoing work and pending tasks to improve our current setup:
- Multi-tenant notifications
- Add $scan <image> command to our bot
- Open-source the lambda for Slack notifications
Open-source the Errbot plugin for interact with Anchore API.
This was the first part of a series on how we are dealing with security in a fast-paced environment. We touched a little on how we are dealing with our docker container images’ security compliance. Anchore has been shown a great solution together with the admission controller to help us to achieve that.
Stay tuned for our next post where we will cover other awesome solutions we are working with!
As always, if you’re interested in pursuing a career as a Systems Engineer in RPS at Delivery Hero, have a look at our open position: