Getting A First Patch Into Docker

A few weeks ago, I was hitting what seemed like a weird bug using Docker. My use case was fairly off the beaten path, so I wasn’t overly surprised when it turned out to be a bug in Docker itself.

Now, fortunately, this post isn’t a rant about how there was a bug in Docker!

Instead, it’s about relating my experience getting a first patch into Docker to fix said bug, and sharing a few tips for others that might want to do the same. In other words, you can hope to learn a little about Docker’s architecture, and about hacking on Docker itself.

Now, I should preface this by saying that prior to this episode I was largely oblivious to Docker’s implementation, and that my familiarity with Go (Docker’s implementation language) was quite limited. So, even if you have have never browsed the Docker codebase, and only know a thing or two about Go, this post should still be easy to follow!

What was I after? #

You can reproduce the bug I needed to fix with the following commands (provided your Docker version doesn’t have the fix: 1.3.1 doesn’t, but 1.3.2 does):

First, create a data container (data_foo):

docker run --name data_foo -v /foo busybox

Then, run a container that mounts volumes from this data container, and from another data container (data_bar) that doesn’t exist (which will cause this to fail):

docker run --name consumer --volumes-from data_foo --volumes-from data_bar busybox

Note that in Docker parlance, run actually means create and start a container. Here, the step that failed was the start step.

Now, start the failed container again. Of course, this should fail (because data_bar is still missing). Except it won’t (and that’s the bug):

docker start consumer

We can make things even more interesting by creating data_bar before starting consumer a second time:

docker run --name data_foo -v /foo busybox
docker run --name consumer --volumes-from data_foo --volumes-from data_bar busybox
docker run --name data_bar -v /bar busybox
docker start consumer

The start still works, the volume from data_bar‘s isn’t mounted! Even though data_bar exists this time:

docker inspect --format "{{ .Volumes }}" consumer
# map[/foo:/foo]

Understanding the Bug #

If you look closely, the gist of the issue is that Docker “remembers” the volumes that were available the first time we attempted to start the container and reuses those the second time we launch.

So, in other words, this looked a lot like a cleanup issue. That was something to work with!

How does Docker start a container? #

Docker’s architecture is quite straightforward: containers are represented by instances of the Container struct, which is defined in daemon/container.go. The same file also contains a few methods that mutate containers (other container methods are defined in the files found next to container.go).

To avoid concurrency problems, methods that mutate containers lock it first (to that end, the Container struct “embeds” a State, which in turns embeds a Mutex that can be Lock'ed).

The method that interests us here is Start(). Start is a wrapper around a bunch of setup steps. If one of those steps fails, then the method will cleanup and return an error (and the error will in turn be returned to the client, which in our case is the Docker CLI).

Note that cleanup is performed using a “deferred” statement, which is similar to a try / finally structure in the sense that the statement is executed when the surrounding function (Start) exits.

Finding the root cause #

Now, this is also where the bug I was looking for was. prepareVolumes was the obvious suspect considering the issue I described above.

prepareVolumes skips initializing volumes in case at least one volume was initialized already, but the bug lies in the fact that one volume being initialized does not guarantee all volumes were, due to the implementation of applyVolumesFrom

Indeed, in applyVolumesFrom, volumes “groups” are initialized one by one, and if one volume group fails to initialize, earlier volume groups aren’t de-initialized, which results in the list of volumes being left in what looks like an initialized state (remember that Docker considers the list of volumes to be initialized if at least one volume was), when in reality it isn’t.

A stopgap fix here was to rewrite applyVolumesFrom so that it first checks whether all volumes groups can be initialized before trying to initialize the first one (there might still be a possibility of ending up with an improperly initialized volumes list if this line throws an error).

Folks from Docker indicated this area of the codebase was being re-thought, so this was good enough for now.

Writing tests #

Of course, fixing a bug is only half the work: we also need tests. Fortunately, Docker has a very solid development story — all you need to develop Docker is… Docker itself (and make)! This is all documented here.

Docker prefers tests to be authored as cli tests (my understanding is that unlike “integration” tests, they are more lightweight). Those are found in the integration-cli folder.

There are a few things you should know if you intend to write your own tests:

Once you’ve written your test, you execute it with make integration-cli.

Note that the TESTFLAGS environment variable lets you specify which tests you’d like to run (e.g. TESTFLAGS='-run ^MyNewShinyTest$' lets you restrict execution to the test you just added, which might save you a lot of time!).

And when you submit your PR, don’t forget to run gofmt first! Checking the contributing guidelines is of course a pre-requisite too.

Wrapping up #

And that’s it for my first experience getting a patch in Docker! I was positively surprised by how readable Docker is (despite me not being very familiar with Go), and really appreciated how easy it was to add tests.

I do wish the docs would have made it a bit clearer what tests should look like, and what library functions are available there, but hopefully this post can help future would-be contributors!

The docker folks were also pretty fast at acknowledging and eventually merging the pull request, which is something you always appreciate when trying to get a patch in.

– Thomas

 
100
Kudos
 
100
Kudos

Now read this

Can it fork?

It’s a well-known fact that Linux (by default) overcommits memory, meaning that allocating memory via e.g. malloc practically never fails (this behavior is configurable via the vm.overcommit_memory sysctl knob). This is possible because... Continue →