Using Docker

Using docker

For about the last year, I’ve been mainly working with docker to run our infrastructure. We don’t run thousands of computers, but still in the order of a hundred containers. Here, I describe our experience with docker. Although I mainly write about the problems we have, I must say that we run our service successfully now for a good year all on docker containers.

Problems when running docker we’ve experienced

Volumes

One major complications are volumes. Volumes are data storage for persistent data on the host. You use them, for example, to save the contents of your database. If you then restart the container, the data is still there. Per default, docker uses a directory on the host machine where the data are stored. You can also use add additional functionality of volumes by using plugins (see Write a volume plugin for information how to write a plugin and Volume plugins for a list of available plugins).

So far so good. However, the problem starts when you want to move a container from one host comptuter to another host. Starting the docker container on another host is no problem. But all your data still stored is on the original host. So you need to somehow transfer the data from the original host to the new host, which is often not trivial. Sometimes you have the possibility of replication, but not always. So you end up writing a lot of custom code to move data. Plus you often need downtime to move the data.

Instability of docker

Quite a few times when docker released a new version, old working code would not work anymore. Or worse, new functionality was not yet production ready. So you end up running into many problems simply by trying to get the current version running on the new docker version.

For instance, we tried to use the overlay network. We literally spent weeks with different approaches, but never reached a stable setup. So in the end just a few days before the go-live, we switched to weave network, which worked out flawlessly and which we are still using.

Log and aggregation

Per default, docker logs are stored on the host machine in json format. So to access the logs you need to know on which machine the docker container is running, log into the machine, and check the log.

Another typical setup is to centralize all logs by using, for instance, ELK stack or Fluentd. The disadvantage of the centralization is all the bandwidth that you use by sending all log entries through the network. The advantage is you can easily add full text search to the logs.

We tried the centralized approach, but fast switched back to the simple json log on the host. The reason was that running elasticsearch uses a lot of resources (it’s java) and complicates our setup considerably without adding a lot of functionality. We simply grep our logs and also have the search functionality (although less fancy).

Monitoring

Monitoring a running system is difficult, you need to be informed about failing components but not being overwelmed with unimportant issues. Monitoring docker containers is even more complicated, because you not only need to monitor all docker containers, but also the hosts. Furthermore, containers are deleted and rebuilt much more often than normal servers. We use Prometheus, which is a pull based monitoring tool. It has a slightly complicated setup (you need several running pieces), but once you’ve understood the basics, it’s very simple to get running.

Base system CoreOs

When we started using docker in production, we run all containers on a debian based system. However, debian has far too much functionality baked in, so we switched to CoreOs. Now CoreOs has three releases, alpha, beta, and stable. Stable is the one recommended for production use. However, because stable is always a few releasees behind the newest docker releases (like a few months), you cannot use the newest functionality. But that turned out to be a good thing as the system is much more stable (see above docker instabilities).

Some pattern we use

Versions

Every image we create has a version. This way, we can ensure the correct version is running without going into the application. The way we do this is by creating a VERSION file with the version written in it.

 $ cat VERSION
 2.13.0

The VERSION file is then read into a variable before building and running the image. Here an example how to build with an image with versioning:

version=`cat VERSION`
echo "Backend $version"
docker build -t abc/backend .
docker build -t abc/backend:$version .

As you see, we always create two images, so that the default image (without the versioning) has the newest version.
To run the image

version=`cat VERSION`
: ${VERSION:=$version}
docker run -d abc/backend:$version

Blue/green deployment

Another pattern we use for deployment is blue/green (see BlueGreenDeployment by Martin Fowler). This pattern practically runs the service twice, once in the production version and once for testing (typically the to-be-deployed version). Whenever you deploy, the new code is run for testing. Once the testing version has been validated the color is being switched and the testing version becomes the production version.

To give an example. Blue corresponds to current production, green to testing. We deploy the new code and check it on green. If all good, the color is switched and production runs now on green. The next deployment is on blue to again test.

To know which color is run, you need to store the current color into a database. We used consul to store the value. Furthermore, you need a load balancer with the two services running, one for the production and the other for testing. The following bash functions are useful:

next_color() {
    SERVICE_NAME=$1
    CURR_COLOR=`curl -s http://$PROD_SERVER:8500/v1/kv/$SERVICE_NAME/color?raw`
    if [ "$CURR_COLOR" == "blue" ]; then
        echo "green"
    else
        echo "blue"
    fi
}

current_color() {
    SERVICE_NAME=$1
    CURR_COLOR=`curl -s http://$PROD_SERVER:8500/v1/kv/$SERVICE_NAME/color?raw`
    if [ "$CURR_COLOR" == "blue" ]; then
        echo "blue"
    else
        echo "green"
    fi
}

deploy() {
    SERVICE_NAME=$1
    ADDITIONAL_ARG=$2
    NEXT_COLOR=$(next_color $SERVICE_NAME)
    echo "Test for $SERVICE_NAME will run on: $NEXT_COLOR"
    restart "$SERVICE" $NEXT_COLOR $ADDITIONAL_ARG
}

switch() {
    SERVICE_NAME=$1
    CURR_COLOR=$(current_color $SERVICE_NAME)
    echo "Current color: $CURR_COLOR"
    NEXT_COLOR=$(next_color $SERVICE_NAME)
    echo "Put next color: $NEXT_COLOR"
    curl -X PUT -d $NEXT_COLOR http://$PROD_SERVER:8500/v1/kv/$SERVICE_NAME/color
}

So to deploy you always do the following: deploy XX, test the new deployed version and if all good run deploy XX to put the tested version into production. I maybe write another post about how exactly to deploy with blue/green. Or you get the book from Victor Farcic.

Start all script

Another very useful tool is a script that starts all services. Even though docker has the --restart=always option, it not always works correctly. So what we did is simply write a script that checks on each host that the necessary services are running. If one of the hosts is rebooted we simply run our script and can be sure all services are up.