Louis-Philippe Véronneau - cihttps://veronneau.org/2018-05-18T00:00:00-04:00Running systemd in the Gitlab docker runner CI2018-05-18T00:00:00-04:002018-05-18T00:00:00-04:00Louis-Philippe Véronneautag:veronneau.org,2018-05-18:/running-systemd-in-the-gitlab-docker-runner-ci.html<p>At the DebConf videoteam, we use ansible to manage our machines. Last fall in
Cambridge, we migrated our repositories on <code>salsa.debian.org</code> and I started
playing with the Gitlab CI. It's pretty powerful and helped us catch a bunch of
errors we had missed.</p>
<p>As it was my first …</p><p>At the DebConf videoteam, we use ansible to manage our machines. Last fall in
Cambridge, we migrated our repositories on <code>salsa.debian.org</code> and I started
playing with the Gitlab CI. It's pretty powerful and helped us catch a bunch of
errors we had missed.</p>
<p>As it was my first time playing with continuous integration and docker, I had
trouble when our playbooks used <code>systemd</code> in a way or another and I couldn't
figure out a way to have <code>systemd</code> run in the Gitlab docker runner.</p>
<p>Fast forward a few months and I lost another day and a half working on this
issue. I haven't been able to make it work (my conclusion is that it's not
currently possible), but I thought I would share what I learned in the process
with others. Who knows, maybe someone will have a solution!</p>
<h2>10 steps to failure</h2>
<p>I first stated by creating a privileged Gitlab docker runner on a machine that
is dedicated to running Gitlab CI runners. To run <code>systemd</code> in docker you either
need to run privileged docker instances or to run them with the
<code>--add-cap=SYS_ADMIN</code> permission.</p>
<p>If you were trying to run a docker container that runs with <code>systemd</code> directly,
you would do something like:</p>
<pre>
$ docker run -it --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro debian-systemd
</pre>
<p>I tried replicating this behavior with the Gitlab runner by mounting the right
volumes in the runner and giving it the right <code>cap</code> permissions.</p>
<p>The thing is, normally your docker container runs a entrypoint command such as
<code>CMD ["/lib/systemd/systemd"]</code>. To run its CI scripts, the Gitlab runner takes
that container but replaces the entrypoint command by:</p>
<pre>
sh -c 'if [ -x /usr/local/bin/bash ]; then\n\texec /usr/local/bin/bash \nelif [ -x /usr/bin/bash ]; then\n\texec /usr/bin/bash \nelif [ -x /bin/bash ]; then\n\texec /bin/bash \nelif [ -x /usr/local/bin/sh ]; then\n\texec /usr/local/bin/sh \nelif [ -x /usr/bin/sh ]; then\n\texec /usr/bin/sh \nelif [ -x /bin/sh ]; then\n\texec /bin/sh \nelse\n\techo shell not found\n\texit 1\nfi\n\n'
</pre>
<p>That is to say, it tries to run <code>bash</code>.</p>
<p>If you try to run commands that require <code>systemd</code> such as <code>systemctl status</code>,
you'll end up with this error message since <code>systemd</code> is not running:</p>
<pre>
Failed to get D-Bus connection: Operation not permitted
</pre>
<p>Trying to run <code>systemd</code> manually once the container has been started won't work
either, since <code>systemd</code> needs to be PID 1 in order to work (and PID 1 is
<code>bash</code>). You end up with this error:</p>
<pre>
Trying to run as user instance, but the system has not been booted with systemd.
</pre>
<p>At this point, I came up with a bunch of creative solutions to try to bypass
Gitlab's entrypoint takeover. Turns out you can tell the Gitlab runner to
override the container's entrypoint with your own. Sadly, the runner then
appends its long bash command right after.</p>
<p>For example, if you run a job with this gitlab-ci entry:</p>
<pre>
image:
name: debian-systemd
entrypoint: "/lib/systemd/systemd"
script:
- /usr/local/bin/my-super-script
</pre>
<p>You will get this entrypoint:</p>
<pre>
/lib/systemd/systemd sh -c 'if [ -x /usr/local/bin/bash ]; then\n\texec /usr/local/bin/bash \nelif [ -x /usr/bin/bash ]; then\n\texec /usr/bin/bash \nelif [ -x /bin/bash ]; then\n\texec /bin/bash \nelif [ -x /usr/local/bin/sh ]; then\n\texec /usr/local/bin/sh \nelif [ -x /usr/bin/sh ]; then\n\texec /usr/bin/sh \nelif [ -x /bin/sh ]; then\n\texec /bin/sh \nelse\n\techo shell not found\n\texit 1\nfi\n\n'
</pre>
<p>This obviously fails. I then tried to be clever and use this entrypoint:
<code>["/lib/systemd/systemd", "&&"]</code>. This does not work either, since docker
requires the entrypoint to be only one command.</p>
<p>Someone pointed out to me that you could try to use <code>exec /lib/systemd/systemd</code>
to PID 1 <code>bash</code> by <code>systemd</code>, but that also fails with an error telling you
the system has not been booted with <code>systemd</code>.</p>
<h2>One more level down</h2>
<p>Since it seems you can't run <code>systemd</code> in the Gitlab docker runner directly,
why not try to run <code>systemd</code> in docker in docker (dind)? dind is used quite a
lot in the Gitlab CI to build containers, so we thought it might work.</p>
<p>Sadly, we haven't been able to make this work either. You need to mount volumes
in docker to run <code>systemd</code> properly and it seems docker doesn't like to mount
volumes from a docker container that already have been mounted from the docker
host... Ouf.</p>
<p>If you have been able to run <code>systemd</code> in the Gitlab docker runner, please
<a href="https://veronneau.org">contact me</a>!</p>
<h2>Paths to explore</h2>
<p>The only Gitlab runner executor I've used at the moment is the docker one, since
it's what most Gitlab instances run. I have no experience with it, but since
there is also an LXC executor, it might be possible to run Gitlab CI tests with
<code>systemd</code> this way.</p>