Running systemd in the Gitlab docker runner CI

2018-05-18 - Louis-Philippe Véronneau

At the DebConf videoteam, we use ansible to manage our machines. Last fall in Cambridge, we migrated our repositories on salsa.debian.org and I started playing with the Gitlab CI. It's pretty powerful and helped us catch a bunch of errors we had missed.

As it was my first time playing with continuous integration and docker, I had trouble when our playbooks used systemd in a way or another and I couldn't figure out a way to have systemd run in the Gitlab docker runner.

Fast forward a few months and I lost another day and a half working on this issue. I haven't been able to make it work (my conclusion is that it's not currently possible), but I thought I would share what I learned in the process with others. Who knows, maybe someone will have a solution!

10 steps to failure

I first stated by creating a privileged Gitlab docker runner on a machine that is dedicated to running Gitlab CI runners. To run systemd in docker you either need to run privileged docker instances or to run them with the --add-cap=SYS_ADMIN permission.

If you were trying to run a docker container that runs with systemd directly, you would do something like:

$ docker run -it --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro debian-systemd

I tried replicating this behavior with the Gitlab runner by mounting the right volumes in the runner and giving it the right cap permissions.

The thing is, normally your docker container runs a entrypoint command such as CMD ["/lib/systemd/systemd"]. To run its CI scripts, the Gitlab runner takes that container but replaces the entrypoint command by:

sh -c 'if [ -x /usr/local/bin/bash ]; then\n\texec /usr/local/bin/bash \nelif [ -x /usr/bin/bash ]; then\n\texec /usr/bin/bash \nelif [ -x /bin/bash ]; then\n\texec /bin/bash \nelif [ -x /usr/local/bin/sh ]; then\n\texec /usr/local/bin/sh \nelif [ -x /usr/bin/sh ]; then\n\texec /usr/bin/sh \nelif [ -x /bin/sh ]; then\n\texec /bin/sh \nelse\n\techo shell not found\n\texit 1\nfi\n\n'

That is to say, it tries to run bash.

If you try to run commands that require systemd such as systemctl status, you'll end up with this error message since systemd is not running:

Failed to get D-Bus connection: Operation not permitted

Trying to run systemd manually once the container has been started won't work either, since systemd needs to be PID 1 in order to work (and PID 1 is bash). You end up with this error:

Trying to run as user instance, but the system has not been booted with systemd.

At this point, I came up with a bunch of creative solutions to try to bypass Gitlab's entrypoint takeover. Turns out you can tell the Gitlab runner to override the container's entrypoint with your own. Sadly, the runner then appends its long bash command right after.

For example, if you run a job with this gitlab-ci entry:

image:
  name: debian-systemd
  entrypoint: "/lib/systemd/systemd"
script:
- /usr/local/bin/my-super-script

You will get this entrypoint:

/lib/systemd/systemd sh -c 'if [ -x /usr/local/bin/bash ]; then\n\texec /usr/local/bin/bash \nelif [ -x /usr/bin/bash ]; then\n\texec /usr/bin/bash \nelif [ -x /bin/bash ]; then\n\texec /bin/bash \nelif [ -x /usr/local/bin/sh ]; then\n\texec /usr/local/bin/sh \nelif [ -x /usr/bin/sh ]; then\n\texec /usr/bin/sh \nelif [ -x /bin/sh ]; then\n\texec /bin/sh \nelse\n\techo shell not found\n\texit 1\nfi\n\n'

This obviously fails. I then tried to be clever and use this entrypoint: ["/lib/systemd/systemd", "&&"]. This does not work either, since docker requires the entrypoint to be only one command.

Someone pointed out to me that you could try to use exec /lib/systemd/systemd to PID 1 bash by systemd, but that also fails with an error telling you the system has not been booted with systemd.

One more level down

Since it seems you can't run systemd in the Gitlab docker runner directly, why not try to run systemd in docker in docker (dind)? dind is used quite a lot in the Gitlab CI to build containers, so we thought it might work.

Sadly, we haven't been able to make this work either. You need to mount volumes in docker to run systemd properly and it seems docker doesn't like to mount volumes from a docker container that already have been mounted from the docker host... Ouf.

If you have been able to run systemd in the Gitlab docker runner, please contact me!

Paths to explore

The only Gitlab runner executor I've used at the moment is the docker one, since it's what most Gitlab instances run. I have no experience with it, but since there is also an LXC executor, it might be possible to run Gitlab CI tests with systemd this way.


debiancigitlabdocker