With containers comes container images. Yes, you don't necessarily have to, but it is nicer to isolate te filesystem too so that one can fix the packaging problem of an application and laugh at the developers of dynamic linked libraries.

The most popular and the only sane way to create container images today is Dockerfile. There are actually tons of tools which can build container image today, but all of them use Dockerfile format as the input. There are other options like buildah which has a custom format. Then, there is also packer which allows you to wrtite what they call, wait for it, a packerfile. Although, the great thing about packerfile is that is much more declarative. I love that stuff, although, packer was built to create VM images, so they lump all the changes you want in a single container layer, which is not great since layers help not just with build speeds, but also deployment speeds since the older layers are cached.

Anyway, this blog post is mostly going to try to convince you that Dockerfiles are not the best thing possible to create container images. I am okay with it being an input for whatever tool that builds images, but it shouldn't be something that I have to write. It is essentally a shell script and makes it terribly difficult to do so so many things.

So, first lets see what is the difference between a declaratve and imperative language. These terms are used more often in the context of programming languages. I am not going to go too much into the details of what is the difference between each of them, but let me explain with what would I like a declarative build script for a contianer image should look like:

FROM ubuntu:18.10

ENV LANG=en_US DEBIAN_FRONTEND=noninteractive

RUN apt update && \
    apt install --yes python3-mysql mariadb-server && \
    apt cache clean
    
RUN useradd -S mailman

USER mailman

SHELL /bin/bash

CMD ["mailman", "start"]

This is a imperative script, you write exactly how things should work and exactly how the image would look like. You said that you want apt related changes in one single layer, and the useradd command in the 2nd layer. Then you said what environment variables you want. You write the precise commands and flags with apt install command, like --yes for the non-interactive flag, DEBIAN_FRONTEND environment variable to signal apt that it is not running an interactive session.

This is great, however, that is a lot of duplication that is going to happen across 10s or 100s of container image that one might maintain. Be it a company or an individual. You can do every possible things today with bash scripts and unix tools, but programming languages were invented so that you could do that in a more maintinaible way and possibly avoid duplication.

What is the information that you want to create a container image? Well, most of the declarations in a Dockerfile are pretty declarative, like FROM, ENV, USER, SHELL, CMD, PORT and several others that you can find from the official Dockerfile reference. The only non-declarative option is RUN IMO.

RUN basically allows arbitrary commands, which users execute to setup an image. It is great to have a power like this, but not many people need something like this, not for all the usual tasks like installing packages, chowing files, adding users, setting up configurations (using postconf -e for example, for setting up postfix), adding pre-initialization steps in the ENTRYPOINT etc.

We don't really need to reinvent the entire wheel to solve all the above problems. There are tons of other tools which have solved this problem, they are called configuration management tools. Chef, Puppet, Ansible. I understand that their model is structured around managing runtime systems and enforcing policies at runtime (Ansible is slightly different, it can be used for one time config and runtime enforcement, maybe others too. I am not much familiar with Chef and Puppet personally).

But there does seem to be a need for a more special case config management like tool which can emit a Dockerfile as an output from configuration file with as less Shell scripts and commands as possible. It is a design decision on how much control you really wat to give your users, not all programmers want the flexibility and agility in the system. Sometimes, people are happy to have some decisins made on their behalf with the best practices so that they can focus on their work. Unless your work is creating container images, you shouldn't have to worry about layers, base images, efficiency to build and download images etc.

[my-image]
base = "ubuntu:18.10"
install = [ "python3-dev", "mariadb"]
user = "mailman"
command = "mailman start"

So, this is what I expect a simple declarative format for a container image declaration should look like. Note that it is not something novel that I have come up by any means, it is more of less exactly like a packerfile. But what really is the benefit of using something like this?

  • First, just by looking at the base I can easily infer a couple of things, it is an apt based system so I know how to install the packages from install section, I can clean up the caches for apt based system (we can do the same for dnf or pacman based systems too).
  • Looking at the install sections I know what the final derived license of my container images can be, and I can optimize the container image as much as I like by deleting every single database file for apt or dnf.
  • user directive will let me know what to run the final command as, I don't necessarily need to use the USER directive of the Dockerfile since there are setup scripts ENTRYPOINT ┬áthat need root so you can just use something like su-exec and render an entrypoint script yourself.

There are more things that you can do in an automated fasion using a tool to remove the duplicated stuff in Dockerfile. You can go a step beyond as well.

One idea that I had was to manage system dependencies for applications. You can even give up the control of the install , base, user declaratives and do something like:

[my-image]
wants = [ "libffi.so.1", "libopenssl.so.1", "libpython.so.3.5", "mysqld"]
command = "mailman start"

Wants is an even more declarative way to define what precise depdencies you have, like you may want MySQL, commonly known binary is mysqldand it comes from different packages in different operating systems. This wishful tool can figure out which library is provided from which package in apt and dnf systems.

You can also just figure out which other base images you already have in your conianer registry that you can re-use to not have to build an image at-all.

This tool can go even a step ahead to bridge the gap between "cloud native system"and "traditional applications". Tons of people write more shell scripts to just use environment variables to be rendered in configuration files which is expected by traditional applications.

None of these scipts actually setup any applications because there are myriads of ways to setup applications that there is no way generalize. This more around the management of system dependencies and operating system, which is actually a hard process.

I am not sure if I have convinved anyone that this can be something useful, but I really wish something like this existed. I would certainly use something like this. This being a TOML file, I can even put multiple applications in a single file and have a look at it together, copying out common parts into an image right on the top and deriving the rest of the images from it.

[base-image]
base = "ubuntu:18.10"
install = ["openssl", "libffi-dev", "libtorrent-devel", "mysql"]
user = "mailman"

[container-image-1]
# reference to the image above.
base = "ref:base-image"
command = "mailman start"

[container-image-2]
base = "ref:base-image"
command = "mysqld"

This generates two images, from the same image. But with different commands. I know it is not the best example, but my intention was to only show what is possible. And then users will come up with innovative ways to do other things with it :)