Deploying Mayan EDMS using Docker and MySQL

Wed, Jul 26, 2017

Python packaging and distribution is still not perfect. Packaging and distributing Django projects is less perfect and a bit painful. There quite a few hurdles to overcome to package Django projects for distribution. If you are interested, I’ve given a few talks on the topic like this one in PyCon Sei.

So, what is Docker? It is a way to isolate a program at the operating system level. Unlike virtualization, where the entire hardware running the program is isolated from the host computer via emulation, in containers, only the operating system running the program is isolated from the host computer using a separate “context”, a technique that allows running an operating system inside an operating system. While technically not the same, the end result “feels” similar to the end user as if it was a sort of “lightweight visualization”. A hardware capable of hosting 10 virtual machines will be able to host 100 or more containers.

Aside from operating system isolating, Docker (and containers in general) also works as a packaging solution. This is because you are storing your program and all the requirements needed to execute it in a repeatable manner in the same distribution medium, all the way to down to the operating system. What this means is that a Docker packaged program will run exactly the same regardless of the host operating system. In that sense, it offers many of the same benefits of visualization without the execution overhead.

“Dockerizing” Django projects is an art. There are no official guidelines and the entire community is still learning what works and what doesn’t. If you are interested in learning more about the state of technologies and techniques used to package a Django project as a Docker image you can watch my PyCon Sette presentation on the topic.

Hopefully, from that introduction, you will have a better idea why creating a Docker image for Mayan is a good idea. You get a distributable image that reduces installation steps from a few dozen to just two, and you end up with an execution unit isolated from your system that can be controlled at will.

The packaging philosophy on Docker is that each container should perform just one (or the least possible) functions. This is called “separation of concerns” and offers many advantages like being able to scale specific parts of your Docker deployment. Following this philosophy, the Mayan EDMS image includes the bare minimum to provide a running instance. That is, the Mayan EDMS image contains Mayan EDMS, a web server (NGINX), a broker to move messages for the background tasks and store their results (Redis for both), and a process manager to keep everything chugging along once deployed. Since Python has native support for SQLite, it is used by default. This setup will give you a fairly decent and well-performing deployment up to several tens of thousands of document and a few concurrent users. If you want to go beyond that then you need to scale up your Mayan EDMS Docker deployment, starting with the database, hence the purpose of this post.

To get a Mayan EDMS Docker installation using MySQL we will be launching two containers: one for MySQL and one for Mayan. Normally programs are configured via a configuration or “ini” files but the philosophy for dockerized programs is to configure containers via environment variables. Here are the steps to follow.

Step 1

Docker containers are isolated by design, that means execution, file access, and network access isolation. To get our two Docker containers “talking” we create a network just for them. For that we use the command:

docker network create mayan -d bridge

This creates a bridge network, a simple network type used to connect hosts without routing. We’ll call our bridge network mayan. We will deploy our containers using this network so that they have network access to each other as if they were the only two computers in a local area network (LAN). Docker recently added support for dynamic domain names which means we can reference the containers by name and not just IP address.

Step 2

As mentioned above we will configure the containers using environment variables that are passed to the container when created. Since we will be passing a few variables, the command line to launch the containers will be a bit long and prone to data entry mistakes. For these situations, Docker allows us to define those environment variables in a file and pass the filename when launching the containers, let’s do that now and create a file name envfile with the content:

# MySQL container
MYSQL_ROOT_PASSWORD=mysql_root_password
MYSQL_PASSWORD=mayan_password
MYSQL_DATABASE=mayan_db
MYSQL_USER=mayan_user

# Mayan container
MAYAN_DATABASE_DRIVER=django.db.backends.mysql
MAYAN_DATABASE_NAME=mayan_db
MAYAN_DATABASE_USER=mayan_user
MAYAN_DATABASE_PASSWORD=mayan_password
MAYAN_DATABASE_HOST=mayan-mysql
MAYAN_DATABASE_PORT=3306

The first set of variables configures the MySQL container to create a database when it launches, create a user and grant it all permission to that database. The second set of variables configures the Mayan container to use the specified credentials to access the database container, which from Mayan’s point of view is just another host in a network, in this case, a host that will be called mayan-mysql. The initial line just tells Mayan which Django database driver to use when accessing the database.

Step 3

Now we proceed to create and launch the first container, the MySQL container using the command line:

docker run -d --name mayan-mysql --restart=always --env-file envfile -v mayan_mysql:/var/lib/mysql --net=mayan mysql:latest

This tells Docker to create and run a container named mayan-mysql that will restart every time it stops for whatever reason (hangs or the host restarts), that will use the file envfile for configuration, will store its data from the directory /var/lib/mysql into a persistent Docker storage (called volumes) by the name of mayan_mysql, will use the mayan network and will use the latest official MySQL Docker image.

You can watch the log files of the container as it initializes with the command:

docker logs mayan-mysql

Step 4

Finally we launch the Mayan container with the command:

docker run -d --name mayan-edms --restart=always --env-file envfile -v mayan_data:/var/lib/mayan --net=mayan -p 80:80 mayanedms/mayanedms:2.6.4-3

This tells Docker to create and run a container named mayan-edms that will restart every time it stops for whatever reason, that will use the file envfile for configuration, will store its data from the directory /var/lib/mayan into a persistent Docker volume called mayan_data, will expose its internal port 80 (HTTP) as 80 port to the outside world, will use the mayan network and is using the version 2.6.4-3 of the official Mayan EDMS Docker image.

Inspect the logs of the container using:

docker logs mayan-edms

and you should see the container creating the database, and doing all the initialization required. After a few minutes, you will be able to browse to localhost (or 127.0.0.1), on port 80 on the machine running the containers and use Mayan normally. Since the containers were launched with the --restart=always option, you don’t need to do anything to start them up next time you boot up the host computer.

Compare those steps to the number of steps required for a “bare metal” production deployment of Mayan EDMS (or any Django project) and you will see why Docker is becoming such a successful medium to not just run code, but also to distribute it.