Securing GitLab + Docker CI Pipelines

Cesar Talledo
13 min readOct 21, 2020

October 21, 2020

Intro

Continuous integration (CI) jobs often require interaction with Docker, either for building Docker images and/or deploying Docker containers.

One of the most popular DevOps tools for CI is GitLab, as it offers a complete suite of tools for the DevOps lifecycle. While GitLab is an excellent tool suite, it offers weak security when running CI jobs that require interaction with Docker.

Now, you may ask, why do I need security for my CI jobs?

Because the security weaknesses I describe allow CI jobs to perform root level operations on the machine where the job executes, thus compromising the stability of the CI infrastructure and possibly beyond.

This article explains these security issues and shows how the new open-source Sysbox container runtime, developed by Nestybox, can be used to harden the security of these CI jobs while at the same time empowering users to create powerful CI configurations with Docker.

TL;DR

The article is a bit long as the first half gives a detailed explanation of the security related problems for GitLab jobs that require interaction with Docker.

If you understand these problems already, you may want to jump directly to section Solution: Using Docker + Sysbox.

Contents

Security Problems with GitLab + Docker

It is common for CI jobs to require interaction with Docker, often to build container images and/or to deploy containers. These jobs are typically composed of steps executing Docker commands such as docker build, docker push, or docker run.

In GitLab, CI jobs are executed by the “GitLab Runner”, an agent that installs on a host machine and executes jobs as directed by the GitLab server (which normally runs in a separate host).

The GitLab runner supports multiple “executors”, each of which represents a different environment for running the jobs.

For CI jobs that interact with Docker, GitLab recommends one of the following executor types:

  • The Shell Executor
  • The Docker Executor

Both of these however suffer from weak security for jobs that interact with Docker, meaning that such jobs can easily gain root level access to the machine where the job is executing, as explained below.

Security issues with the Shell Executor

When using the shell executor, the CI job is composed of shell commands executed in the same context as the GitLab runner.

The diagram below shows the context in which the job executes:

A sample .gitlab-ci.yaml looks like this:

build_image:
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests

The shell executor is powerful due to the flexibility of the shell, but it’s unsecure for Docker jobs: it requires the GitLab runner be added to the docker group, which in essence grants root level access to the job on the runner machine.

For example, the CI job could easily take over the runner machine by executing a command such as docker run --privileged -v /:/mnt alpine <some-cmd>. In such a container, the job will have unfettered root level access to the entire filesystem of the runner machine via the/mnt directory in the container.

The shell executor also suffers from a couple of other functional problems:

1) The job executes within the GitLab runner’s host environment, which may or may not be clean (e.g., depending on the state left by prior jobs).

2) Any job dependencies must be pre-installed into the runner machine a priori.

For these reasons, developers often prefer the GitLab Docker Executor, but unfortunately it’s also unsecure (see next section).

Security issues with the Docker Executor

When using the Docker executor, the CI job runs within one or more Docker containers. This solves the functional problems of the shell executor described in the prior section because you get a clean environment prepackaged with your job’s dependencies.

However, if the CI job needs to interact with Docker itself (e.g. to build Docker images and/or deploy containers), things get tricky.

In order for such a job to run, the job needs access to a Docker engine or to use a tool like Kaniko that enables building Docker images without a Docker engine.

Kaniko works well if the CI job only needs to build Docker container images from a Dockerfile. But it does not help if the job needs to perform deeper interactions with Docker (e.g., to run containers, to run Docker Compose, to deploy a Kubernetes-in-Docker cluster, etc.)

For CI jobs that need to interact with Docker, the job needs access to a Docker engine. GitLab recommends two ways to do this:

1) Binding the host’s Docker socket into the job container

2) Using a Docker-in-Docker (DinD) “service” container

Unfortunately, both of these are unsecure setups that easily allow the job to take control of the runner machine, as described below.

Binding the host Docker Socket into the Job Container

This setup is shown below.

A sample .gitlab-ci.yaml looks like this:

image: docker:19.03.12

build:
stage: build
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests

As shown in the diagram above, the Docker container running the job has access to the host machine’s Docker daemon via a bind-mount of /var/run/docker.sock.

To do this, just must configure the Gitlab runner as follows (pay attention to the volumes clause):

[[runners]]
url = "https://gitlab.com/"
token = REGISTRATION_TOKEN
executor = "docker"
[runners.docker]
tls_verify = false
image = "docker:19.03.12"
privileged = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]

This is the so-called “Docker-out-of-Docker” (DooD) approach: the CI job and Docker CLI run inside a container, but the commands are executed by a Docker engine at host level.

From a security perspective, this setup is not good: the container running the CI job has access to the Docker engine on the runner machine, in essence granting root level access to the CI job on that machine.

For example, the CI job can easily gain control of the host machine by creating a privileged Docker container with a command such as docker run --privileged -v /:/mnt alpine <some-cmd>. Or the job can remove all containers on the runner machine with a simple docker rm -f $(docker ps -a -q) command.

In addition, the DooD approach also suffers from context problems: the Docker commands are issued from within the job container, but are executed by a Docker engine at host level (i.e., in a different context). This can lead to collisions among jobs (e.g., two jobs running concurrently may collide by creating containers with the same name). Also, mounting files or directories to the created containers can be tricky since the contexts of the job and Docker engine are different.

An alternative to the DooD approach is to use the “Docker-in-Docker” (DinD) approach, described next.

Using a Docker-in-Docker Service Container

This setup is shown below.

A sample .gitlab-ci.yaml looks like this:

image: docker:19.03.12

services:
- docker:19.03.12-dind

build:
stage: build
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests

As shown, GitLab deploys the job container alongside a “service” container. The latter contains within it a Docker engine, using the “Docker-in-Docker” (DinD) approach.

This gives the CI job a dedicated Docker engine, thus preventing the CI job from accessing the host’s Docker engine. In doing so, it prevents the collision problems described in the prior section (though the problems related to mounting files or directories remain).

To do this, you must configure the GitLab runner as follows (pay attention to the privileged and volumes clauses):

[[runners]]
url = "https://gitlab.com/"
token = REGISTRATION_TOKEN
executor = "docker"
[runners.docker]
tls_verify = true
image = "docker:19.03.12"
privileged = true
disable_cache = false
volumes = ["/certs/client", "/cache"]

The volumes clause must include the /certs/client mount in order to enable the job container and service container to share Docker TLS credentials.

But notice the privileged clause: it's telling GitLab to use privileged Docker containers for the job container and the service container. This is needed because the service container runs the Docker engine inside, and normally this requires unsecure privileged containers (though there is now a solution to this as you'll see a bit later).

Privileged containers weaken security significantly. For example the CI job can easily control the host machine’s kernel by executing a docker run --privileged alpine <cmd> where <cmd> will have full read/write access to the machine's/proc/ filesystem and thus able to perform all sorts of low-level kernel operations (including turning off the runner machine for example).

Solution: Using Docker + Sysbox

There is now a way to secure GitLab CI jobs that require interaction with Docker: using the new Sysbox container runtime.

Sysbox is an open-source container runtime that sits below Docker (it’s a new “runc”) and enables deployment of containers that are capable of running systemd, Docker, and even Kubernetes inside, easily and securely.

That is, a simple docker run --runtime=sysbox-runc <container_image> creates a container capable of running Docker inside natively, without any special images or custom container entry-points, and most importantly, with strong container isolation.

By using GitLab with Docker + Sysbox, the CI pipeline security issues described in the prior sections can be resolved, as described next.

There are a couple of approaches to do this, as described below.

A Secure DinD Service Container

The first approach is to use the Docker-in-Docker (DinD) service container described previously, but create the DinD container with Docker + Sysbox, as shown below.

As shown, the runner machine has the GitLab runner agent, Docker, and Sysbox installed.

The goal is for the GitLab runner to execute jobs inside containers deployed with Docker + Sysbox. This way, CI jobs that require interaction with Docker can use the Docker-in-Docker service container, knowing that it will be properly isolated from the host (because Sysbox enables Docker-in-Docker securely).

In order for this to happen, one has to configure the GitLab runner to select Sysbox as the container “runtime” and disable the use of “privileged” containers.

Here is the runner’s config file (at /etc/gitlab-runner/config.toml). Pay attention to the privileged and runtime clauses:

[[runners]]
url = "https://gitlab.com/"
token = REGISTRATION_TOKEN
executor = "docker"
[runners.docker]
tls_verify = true
image = "docker:19.03.12"
privileged = false
disable_cache = false
volumes = ["/certs/client", "/cache"]
runtime = "sysbox-runc"

Unfortunately there is a small wrinkle (for now at least): the GitLab runner currently has a bug in which the “runtime” configuration is honored for the job containers but not honored for the “service” containers, which is a problem since the DinD service container is precisely the one we must run with Sysbox.

As a work-around, you can configure the Docker engine on the runner machine to select Sysbox as the “default runtime”. You do this by configuring the /etc/docker/daemon.json file as follows (pay attention to the default-runtime clause):

{
"default-runtime": "sysbox-runc",
"runtimes": {
"sysbox-runc": {
"path": "/usr/local/sbin/sysbox-runc"
}
}
}

After this, restart Docker (e.g., sudo systemctl restart docker).

From now on, all Docker containers launched on the host will use Sysbox by default (rather than the default OCI runc) and thus will be capable of running all jobs (including those using Docker-in-Docker) with proper isolation.

With this configuration in place, the following CI job runs seamlessly and securely:

image: docker:19.03.12

services:
- docker:19.03.12-dind

build:
stage: build
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests

With this setup you can be sure that your CI jobs are well isolated from the underlying host. Gone are the privileged containers that previously compromised host security for such jobs.

GitLab Runner & Docker in a Container

The setup for this is shown below.

In this approach, I deployed the GitLab runner plus a Docker engine inside a single container deployed with Docker + Sysbox. It follows that the CI jobs run inside that container too, in total isolation from the underlying host.

I called these “system containers”, as they resemble a full system rather a single micro-service. In other words, the system container is acting like a GitLab runner “virtual host” (much like virtual machine, but using fast & efficient containers instead of hardware virtualization).

Compared to the approach in the previous section, this approach has some benefits:

  • It allows the GitLab CI jobs to use the shell executor or Docker executor (either the DooD or DinD approaches) without compromising host security, because the system container provides a strong isolation boundary.
  • You can run many GitLab runners on the same host machine, in full isolation from one another. This way, you can easily deploy multiple customized GitLab runners on the same machine as you see fit, giving you more flexibility and improving machine utilization.
  • You can easily deploy this system container on bare-metal machines, VMs in the cloud, or any other machine where Linux, Docker, and Sysbox are running. It’s a self-contained and complete GitLab runner + Docker environment.

But there is a drawback:

  • For CI jobs that interact with Docker, the isolation boundary is at the system container boundary rather than at the job level. That is, such a CI job could easily gain control of the system container and thus compromise the GitLab runner environment, but not the underlying host.

Creating this setup is easy.

First, you need a system container image that includes the GitLab runner and a Docker engine. There is a sample image in the Nestybox Dockerhub Repo; the Dockerfile is here.

The Dockerfile is very simple: it takes GitLab’s dockerized runner image, adds a Docker engine to it, and modifies the entrypoint to start Docker. That’s all … easy peasy.

You deploy this on the host machine using Docker + Sysbox:

$ docker run --runtime=sysbox-runc -d --name gitlab-runner --restart always -v /srv/gitlab-runner/config:/etc/gitlab-runner nestybox/gitlab-runner-docker

Then you register the runner with your GitLab server:

$ docker run --rm -it -v /srv/gitlab-runner/config:/etc/gitlab-runner gitlab/gitlab-runner register

You then configure the GitLab runner as usual. For example, you can enable the Docker executor with the DooD approach by editing the /srv/gitlab-runner/config/config.toml file as follows:

[[runners]]
name = "syscont-runner-docker"
url = "https://gitlab.com/"
token = REGISTRATION_TOKEN
executor = "docker"
[runners.docker]
tls_verify = false
image = "docker:19.03.12"
privileged = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]

Then restart the gitlab-runner container:

$ docker restart gitlab-runner

At this point you have the GitLab runner system container ready. You can then request GitLab to deploy jobs to this runner, knowing that the jobs will run inside the system container, in full isolation from the underlying host.

In the example above, I used the DooD approach inside the system container, but I could have chosen the DinD approach too. The choice is up to you based on the pros/cons of DooD vs DinD as described above.

If you use the DinD approach, notice that the DinD containers will be privileged, but these privileged containers live inside the system container, so they are well isolated from the underlying host.

Inner Docker Image Caching

One of the drawbacks of placing the Docker daemon inside a container is that containers are non-persistent by default, so any images downloaded by the containerized Docker daemon will be lost when the container is destroyed. In other words, the containerized Docker daemon’s cache is ephemeral.

If you wish to retain the containerized Docker’s image cache, you can do so by bind-mounting a host volume into the /var/lib/docker directory of the container that has the Docker daemon inside.

For example, when using the approach in section A Secure DinD Service Container you do this by modifying the GitLab runner’s config (/etc/gitlab-runner/config.toml) as follows (notice the addition of/var/lib/docker to the volumes clause):

[[runners]]
url = "https://gitlab.com/"
token = REGISTRATION_TOKEN
executor = "docker"
[runners.docker]
tls_verify = true
image = "docker:19.03.12"
privileged = false
disable_cache = false
volumes = ["/certs/client", "/cache", "/var/lib/docker"]
runtime = "sysbox-runc"

This way, when the GitLab runner deploys the job and service containers, it bind-mounts a host volume (created automatically by Docker) into the container’s /var/lib/docker directory. As a result, container images downloaded by the Docker daemon inside the service container will remain cached at host level across CI jobs.

As another example, if you are using the approach in section GitLab Runner & Docker in a Container, then you do this by launching the system container with the following command (notice the volume mount on/var/lib/docker):

$ docker run --runtime=sysbox-runc -d --name gitlab-runner --restart always -v /srv/gitlab-runner/config:/etc/gitlab-runner -v inner-docker-cache:/var/lib/docker nestybox/gitlab-runner-docker

A couple of important notes:

  • By making the containerized Docker’s image cache persistent, you are not just persisting images downloaded by the containerized Docker daemon; you are persisting the entire state of that Docker daemon (i.e., stopped containers, volumes, networks, etc.) Keep this in mind to make sure you CI jobs persist the state that you wish to persist, and explicitly cleanup any state you wish to not persist across CI jobs.
  • A given host volume bind-mounted into a system container’s /var/lib/docker must only be mounted on a single system container at any given time. This is a restriction imposed by the inner Docker daemon, which does not allow its image cache to be shared concurrently among multiple daemon instances. Sysbox will check for violations of this rule and report an appropriate error during system container creation.

Conclusion

If you have GitLab jobs that require interaction with Docker, be aware that these jobs have root-level access to the host on which they run, thus resulting in weak isolation and compromising the stability of your CI infrastructure and possibly beyond.

You can significantly improve job isolation by using Docker in conjunction with the Sysbox container runtime. This article showed a couple of different ways of doing this.

I hope you find this information useful. If you see anything that can be improved or if you have any comments, please let me know!

--

--