Anatomy of a Dockerfile

By | February 3, 2020

I often get to sit in meeting in which the topic of containerization comes up and you can almost see the visible line drawn between a team’s leadership and the technical folks who have to build reality.  On one hand it’s a trendy topic and a worthwhile long-term endeavor in application management and development.  On the other side to that coin, its work within a new technology that seems to border on mysticism to some.  For this exercise we will work to break down the mysticism of building a container and hopefully in the process help show what containers can and cannot do.

At the heart of a container is the Dockerfile.  This is the “cookbook” that will take an application on the journey to becoming a segmented service.  It can also be seen as a simple list of steps that need to be taken in order for the application to run in an environment.

Bob, that sound suspiciously like a script…

Right you are!  Demystifying the process already.

Building a Dockerfile

In this example I will be converting one of our standalone services, Avorion, into a docker container so that it can be migrated off a dedicated guest and into our Kubernetes cluster. 

When we start with building a new container a few questions need to be answered before we begin.

  1. Where does the code come from?
  2. What does the code require to run?
  3. How is it connected to?
  4. Does it have persistent data?

These questions will help dictate how the Dockerfile is built.

As we established earlier a Dockerfile is merely a list of commands that need to be run in order for an application to exist.  Since this is also programming we can (and should) notate sections for what they are doing.  Notes start with #.

The first things we will need to do is establish a base image and who is maintaining it.  Signing your work being optional but were engineers and take pride in our work right?

FROM ubuntu:18.04
MAINTAINER antimodes201

FROM establish where this image will originate from.  I like to work out of ubuntu for most of my basic application services.  You can use anything you like as long as the image either exist in your local repo OR can be found on your satellite repo (hub.docker by default). 

MAINTAINER sets who maintains this package.  This should match whomever is published in your repo.

The next line is a bit of personal preference.   Since my containers are designed to be non-interactive builds (IE you will not be logging directly into them for interactions of any sort) I don’t want my build process to stall out because a prompt is being asked for.  I do this through an ARGUMNET only.  This could also be triggered further down in the individual commands but my personal preference is up here.  You can read a full breakdown of the debate in if you should use ARG vs individual commands (but never ENV) here: https://github.com/moby/moby/issues/4032#issuecomment-34597177

# quash warnings
ARG DEBIAN_FRONTEND=noninteractive

When we are building a file there are two ways to set variables depending on their use. 

ARG (ARGUMENT) – Only applies at the time of the Docker build (and thus cannot be called once the image is complete)

ENV (ENVIRONMENT) – Applies for the life of the container, which means it can be modified via Docker run commands and called inside the image.

I like to build my Dockerfile in the same fashion I script, by building out all variables towards the top that will be used through the body.  This means that before we start the build, we need to know something about the application, specifically question 3: How does it need to be connected to.  For this service the application developer provided a nice list of ports we would need to use (https://avorion.gamepedia.com/Setting_up_a_server).

ENV BRANCH "public"
ENV INSTANCE_NAME "default"
ENV GAME_PORT 27000
ENV QUERY_PORT 27003
ENV STEAM_PORT 27020
ENV STEAM_QUERY 20721
ENV RCON_PORT 27015
ENV ADDITIONAL_ARGS ""
ENV ADMIN ""
ENV TZ "America/New_York"

Here we set a bunch of ENV variables.  I like to use ENV for these to give other users the ability to modify what ports that they want to use. 

SEGUE TIME!

When you build a container even though they use shared processor space from the host system they are treated like their own personal guest.  This means the default behavior of any container is to use UTC 0 for the timezone.  This can create headaches when your looking at application logs and they don’t line up to your timezone.  Due to this I add in tzdata to all of my images and set an environment variable to allow the end user to adjust this to their needs.

Now its time to get into the dependencies for your application.  Depending on your base image you will need to add in application subsets to allow your code to run as baseimages should be stripped down and lite weight start points.  Avorion is distributed via steamcmd so we know it will need gcc (specifically the 32bit package).  We will need to pull steamcmd itself and unzip it, so this gives us our starting point for what the container will need.  80% of the time when someone is asking me for assistance in a dockerbuild its because of a missed dependency.

# dependencies
RUN dpkg --add-architecture i386 && \
        apt-get update && \
        apt-get install -y --no-install-recommends \
                lib32gcc1 \
                wget \
                unzip \
                tzdata \
                ca-certificates && \
                rm -rf /var/lib/apt/lists/*

This introduces us to the RUN phrase.  Simply put RUN does what it says on the tin.  It will run one command that is present.  In the above example we could have broken this command out into 9 different commands, and it would still function. The downside to doing it that way is docker images are built on layers.   The additional layers add to the build time and make the system less efficient, something we will talk about towards the end during optimizations.  Suffice to say that if you can run something as a single command from your unix shell (command && \ NOT line ; ) you can use a single RUN statement in your dockerfile.

I break apart my RUN statements by what we are accomplishing in the step.  Now that dependencies are set we should add a user, as the default is root. Having root own application is a security no no, especially since the persistent portion this will be on NFS.  This is where we tackle question 5. Does the application have persistent data requirement?  Put another way “What should the directory structure look like for the application?”

Since I do not own the rights to the server code, nor do I develop the code there would be legally and sustainability issues with me putting the entire server package into the container.  I don’t want to spend my days talking to lawyers or having to push out code every time the developer does an update so we will separate the application into two segments.  The first part is to download and build the application and the second part is where it will live permanently utilizing Docker volumes.

# create directories
RUN adduser \
    --disabled-login \
    --disabled-password \
    --shell /bin/bash \
    steamuser && \
    usermod -G tty steamuser && \
        mkdir -p /steamcmd \
        mkdir -p /avorion && \
        mkdir -p /scripts &&  \
        chown steamuser:steamuser /avorion && \
        chown steamuser:steamuser /steamcmd && \
        chown steamuser:steamuser /scripts

Now we have all of our dependencies, made a user, and given ourselves three directories to work with. 

  • Avorion – where the application will be installed (done post build in the client system)
  • Steamcmd – where steamcmd will be installed (will be part of the build)
  • Scripts – where launch scripts will be placed (part of the build)

A note about creating users in your docker build.  While docker uses a shared process namespace the ids within the container are non-unique.  In my environment I use this to my advantage for file access.  The account I use for all docker application is the same UID as the one that is generated inside the container.  This means while the username differs from container to container, when the volumes are mounted (NFS) all of the file systems will appear to have the same service owner.  You could be creative and add an ENV variable to your dockerfile to allow the end user to set the UID/GID during user creation if you wanted.

A last note here.  Any directory you intend the end user to mount as a volume should exist BUT be empty. Any data that exist in the volume prior to it being mounted as a volume will not be accessible as docker will initialize the directory at docker run.

Its time to install steamcmd into its directory.  As steamcmd is rarely updated by Valve, will auto update on launch, and their license agreement allows for redistribution of the code, I am comfortable with having it be part of the base image.

# Install Steamcmd
USER steamuser
RUN cd /steamcmd && \
        wget https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.tar.gz && \
        tar -xf steamcmd_linux.tar.gz && \
        rm steamcmd_linux.tar.gz && \
        /steamcmd/steamcmd.sh +quit

New command, USER.  Here we call out that we want the build to now use a new user for all commands from this point forward.  I to put this after I have all dependencies installed and the directory structure laid out as those are things you would need to be root to do.  Now that they exist, I wish all future commands to be done as them, this will include the eventually entry point to the container on run time.

The environment is primed to have the application installed; however, we don’t want to install it with the container build.  We want this to happen after the image has been downloaded into the end user’s environment.  We will want a script to handle this process.

#!/bin/bash -ex
# Start script for Dedicated Server

if [ ${BRANCH} == public ]
then
        # GA Branch
        /steamcmd/steamcmd.sh +login anonymous +force_install_dir /avorion +app_update 565060 +quit
else
        # used specified branch
        /steamcmd/steamcmd.sh +login anonymous +force_install_dir /avorion +app_update 565060 -beta ${BRANCH} +quit
fi

cd /avorion

cp linux64/steamclient.so ./steamclient.so
bin/AvorionServer --galaxy-name ${INSTANCE_NAME} --admin ${ADMIN} --datapath /app/saves ${ADDITIONAL_ARGS}

This script becomes the heart and soul of the container.  It utilizes steamcmd (that will be installed during the docker build) to download and install the application package into the avorion directory.  We established during our design phase that that directory should be a volume mount on the end host to allow application persistency.  The remaining parts are so that we can utilize ENV variables that were set in the Dockerfile to influence how the application behaves.  For example, we use BRANCH from the dockerfile to tell the start script what repo the application should come from.

Once you have your script built out it should be saved and placed in the same directory as your Dockerfile and made executable.  I like to call mine start.sh but anything will do.  We will then need to tell docker is should be included in the build.

ADD start.sh /scripts/start.sh

Another Segue.

There are two ways to add files into your docker image during build time.  ADD and COPY. 

ADD allows you to add in any type of file from any endpoint.  This means you can add in a file from a github repo to your dockerfile without having to download it locally. 

COPY allows you to only add in files that are local to the dockerfile.

There’s been debate on which is correct / best practice.  I use ADD because its more versatile and have several cases where my start.sh is identical and thus pulled from git. I then use ENV flags to alter application IDs and location.

The container now has all its dependencies, a directory structure, and a command set that we will want it to run on launch.  Since docker images do not default to having any way to communicate within them we will want to have the image expose some default ports.

# Expose some port
EXPOSE 27000/tcp

EXPOSE 27000/udp
EXPOSE 27003/udp
EXPOSE 27020/udp
EXPOSE 27021/udp
EXPOSE 27015/tcp

EXPOSE – Opens up the containers port to allow communication on the port.  The default behavior is to expose on TCP, however the application requires UDP for communication on some of the ports.  As many services we work with require a combination of both it is a best practice of ours to label the protocol always.  Something to note, this process does not publish the port when the container is built and running.  They ports will still need to be published as part of the docker run process.

Lets create a volume mount for the persistent part of the container.

# Make a volume
# contains configs and world saves
VOLUME /avorion

VOLUME tells the container it should expect an external mount for this volume.  Due to how this container is structured we could skip this step as the application is installed post docker build.  I still put this into the dockerfile as it allows anyone who is pulling it to have an extra layer of documentation for what they need to setup in their environment to have the service funcution properly. 

Lastly, we need to tell the container what it should do once it’s up.

CMD ["/scripts/start.sh"]

When a container starts up there are two ways you can interact with them, CMD and ENTRYPOINT.

Docker provides two rather generic explanations on the difference between them:

CMD: The main purpose of a CMD is to provide defaults for an executing container.

ENTRYPOINT: An ENTRYPOINT helps you to configure a container that you can run as an executable.

not exactly helpful. I break them down as this.

CMD: The container will only ever run this one command during its life and shutsdown when the task is completed.

ENTRYPOINT: The container has an interactive aspect to it and is shutdown through an application end process.

As the Avorion server is non-interactive through the container (access happens through application commands or RCON) I use RUN.  Almost all of my containers are designed in this manner.

If we put it all together we have:

FROM ubuntu:18.04
MAINTAINER antimodes201

# quash warnings
ARG DEBIAN_FRONTEND=noninteractive

# Set some Variables
ENV BRANCH "public"
ENV INSTANCE_NAME "default"
ENV GAME_PORT=27000
ENV QUERY_PORT=27003
ENV STEAM_QUERY=27020
ENV STEAM_PORT=27021
ENV RCON_PORT=27015
ENV ADDITIONAL_ARGS ""
ENV ADMIN ""
ENV TZ "America/New_York"

# dependencies
RUN dpkg --add-architecture i386 && \
        apt-get update && \
        apt-get install -y --no-install-recommends \
                lib32gcc1 \
                wget \
                unzip \
                tzdata \
                ca-certificates && \
                rm -rf /var/lib/apt/lists/*

# create directories
RUN adduser \
    --disabled-login \
    --disabled-password \
    --shell /bin/bash \
    steamuser && \
    usermod -G tty steamuser \
        && mkdir -p /steamcmd \
        && mkdir -p /ark \
                && mkdir -p /scripts \
        && chown steamuser:steamuser /ark \
        && chown steamuser:steamuser /steamcmd \
                && chown steamuser:steamuser /scripts

# Install Steamcmd
USER steamuser
RUN cd /steamcmd && \
        wget https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.tar.gz && \
        tar -xf steamcmd_linux.tar.gz && \
        rm steamcmd_linux.tar.gz && \
        /steamcmd/steamcmd.sh +quit

ADD start.sh /scripts/start.sh

# Expose some port
EXPOSE 27000/udp
EXPOSE 27000/tcp
EXPOSE 27003/udp
EXPOSE 27020/udp
EXPOSE 27021/udp
EXPOSE 27015/tcp

# Make a volume
# contains configs and world saves
VOLUME /avorion

CMD ["/scripts/start.sh"]

Congratulations, if we run docker build this would build out a working container ready to be used.  However, there’s room for improvement. 

Dockerfile Optimization

If you run docker build you likely saw a series of Steps being called out as the image was being constructed.  Docker creates snapshots of each command that is being run and assembles them in layers. Each layer being a step.  This is one of the reasons at the beginning when discussing run commands, we like to keep those as compact as possible.  Since each of these layers is also an independent snapshot of the commands being run around them (and will be reused in other builds by docker) we should also look to remove any unwanted files in the command that created them.  For example, we did:

RUN cd /steamcmd && \
        wget https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.tar.gz && \
        tar -xf steamcmd_linux.tar.gz && \
        rm steamcmd_linux.tar.gz && \
        /steamcmd/steamcmd.sh +quit

In this single command we downloaded the steamcmd package, installed it, and then deleted the original file.  This removes the downloaded packages from the cached file and keeps its size down. 

The second optimization tip I have is to help speed up those build times.  Remember way back at the beginning (the first line) FROM?  This does not have to be an OS image but instead can be a base image that you (or someone else in the community) has built.  This is especially helpful when many of the services share the same structure.  When you do this all of the layers in the FROM image come over pre built cutting down on your build time.

When I first started converting services over to docker I copied and pasted the same build file over and over because it worked.  As more services shifted to using our Docker environment, I realized we could cut out a portion of our build process by converting the sections that do not change (steamcmd, user, and directory structure) to a base image and utilizing ARG in conjunction with ENV to set the parts of the build that are different. In the quest for cleaner and a more simplified build, we could be even lazier.

The dockerfile now looks like:

FROM antimodes201/steamcmd-base:1.0
MAINTAINER antimodes201

# Set some Variables
ARG GAME_PORT=27000
ARG QUERY_PORT=27003
ARG STEAM_QUERY=27020
ARG STEAM_PORT=27021
ARG RCON_PORT=27015

# set Environment
ENV BRANCH "public"
ENV INSTANCE_NAME "default"
ENV GAME_PORT $GAME_PORT
ENV QUERY_PORT $QUERY_PORT
ENV STEAM_PORT $STEAM_PORT
ENV STEAM_QUERY $STEAM_QUERY
ENV RCON_PORT $RCON_PORT
ENV ADDITIONAL_ARGS ""
ENV ADMIN ""
ENV TZ "America/New_York"


ADD start.sh /scripts/start.sh

# Expose some port
EXPOSE $GAME_PORT/udp
EXPOSE $GAME_PORT/tcp
EXPOSE $QUERY_PORT/udp
EXPOSE $STEAM_PORT/udp
EXPOSE $STEAM_QUERY/udp
EXPOSE $RCON_PORT/tcp

CMD ["/scripts/start.sh"]

Comparing hub.docker build time

33% improvement in build time and a simplification in deployment. Yes please.

Hopefully this help demystify the process of “containerization” or “refactoring applications” to make them function in a docker environment.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.