Zero downtime deployments with zero k8s
My production setup is stil a simple VM, but I still wanted to do zero-downtime deployments. This is how I made that work.
Don’t worry, despite the snarky title this won’t be an anti k8s rant. In fact, I like k8s, I run a small cluster at home, but for my production server I’m still on an old-fashioned VM.
My setup is really simple:
- PostgreSQL on the host
- A Spring Boot application (Adara) running in a container, with
--net=host
, listening on a port on localhost. - Apache HTTP with
mod_proxy
, mostly used for SSL termination.
I realise this setup is thoroughly old-school, but it’s also one I’m deeply familiar with, and it simply works.
Now, why am I even bothering to run my app in a docker container, when I could just have easily just run it from systemd? Well, mostly to make deployments and rollbacks more simple, but also… because I wanted zero downtime deployments!
But how can you do zero-downtime deployments in a setup like this? Let me show you.
The Makefile
Make is a beautifully versatile tool. Most of my application is built in Maven, but I use make to easily script around it. Make is the least opinionated build tool I know, it will pretty much let you do anything you want, as long as it uses proper Unix exit codes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
DOCKER_CMD = docker build -q -t adara:$(GIT_SHA) -t adara:latest adara-backend
GIT_SHA = $(shell git rev-parse --short HEAD)
SERVER := adara.staging
# ⑦
deploy-server: docker push run-deployment git-tag
# ⑥
docker: docker-images/docker-image-$(GIT_SHA).tar.gz
# ③
docker-images/docker-image-$(GIT_SHA).tar.gz: docker-images/docker-image-$(GIT_SHA).sha
docker save adara:$(GIT_SHA) | gzip > docker-images/docker-image-$(GIT_SHA).tar.gz
# ②
docker-images/docker-image-$(GIT_SHA).sha: docker-images build
$(DOCKER_CMD) > docker-images/docker-image-$(GIT_SHA).sha
# ①
docker-images:
mkdir -p docker-images
# ④
push: docker
ssh $(SERVER) mkdir -p adara-deployment
scp docker-images/docker-image-$(GIT_SHA).tar.gz $(SERVER):adara-deployment/
ssh $(SERVER) "cat adara-deployment/docker-image-$(GIT_SHA).tar.gz | gunzip | docker load"
# ⑤
run-deployment:
scp adara-deployment/server/deploy.sh "$(SERVER):adara-deployment/deploy.sh"
rsync -av target/deploy/ $(SERVER):adara
ssh $(SERVER) 'docker tag $(IMAGE_SHA) adara:$(GIT_SHA)'
ssh $(SERVER) 'docker tag $(IMAGE_SHA) adara:latest'
ssh -t $(SERVER) 'adara-deployment/deploy.sh $(GIT_SHA)'
If you’re not familiar with Makefiles, this can look a bit daunting, so let me go through it step by step. Note that Make builds a dependency tree, and then starts at the lowest levels.
So, if A depends on B, and B depends on C, first C will be built, then B, then A. To make this a little clearer, I’ve labeled the steps in the order in which they’re executed.
- This simply creates the
docker-images
folder where we’ll store the image before sending it to the server. - This builds the image (it depends on the
build
step which runs the Maven build). - Then, we use the
docker save
command to get a tar file of the image. We run that through gzip for compression - This command has several steps. We use ssh to remotely create a folder, and the copy the tar.gz file we created into it.
Finally, we use
docker load
to load the image into the local Docker daemon, so it can be used to run containers. - This is where the actual deployment happens. We copy a shell-script to the server, and then tag the docker image we just loaded, so it can be referred to by git-sha. Finally, we run the shell script we just uploaded.
The actual deployment
So, most of the previous was just getting the new version of our software onto the server, but we still need to run it.
To do that, these things need to happen:
- Find a free port on localhost for the new container to listen on
- Start the new container on that port
- Wait for the new container to become healthy
- Switch the Apache
mod_proxy
config to start sending requests to the new port - Bring down the old container
This is all achieved by this script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
#!/bin/bash
# ①
function find_port() {
base_port=8080
increment=1
port=$base_port
isfree=$(netstat -taln | grep listen | grep $port)
while [[ -n "$isfree" ]]; do
port=$[port+increment]
isfree=$(netstat -taln | grep listen | grep $port)
done
echo "$port"
}
# ②
kill_old_pending() {
old_pending=$(docker container ls --all --filter=name=adara-pending --format "{{.id}}")
if [ -n "$old_pending" ]; then
#kill any old pending containers
echo "killing old pending container $old_pending"
docker rm -f $old_pending
fi
}
function start_adara() {
# ③
old_container=$(docker container ls --all --filter=name=adara-backend --format "{{.id}}")
# ④
port="$(find_port)"
container_id=$(docker run --net=host --name=adara-pending --restart unless-stopped -v /opt/adara:/opt/adara -v /var/www:/var/www -d -e server_port=$port adara:$version)
echo "started container $container_id on port $port"
# ⑤
echo "waiting for container to come up..."
if ! curl --silent --retry-connrefused --retry 45 --retry-delay 1 --fail http://localhost:$port/actuator/health > /dev/null; then
echo "container failed to start"
exit 1
fi
# ⑥
echo "retrieving status"
status=$(curl --silent http://localhost:$port/actuator/health | jq -r .status)
echo "got status: $status"
if [ ! "$status" = "up" ]; then
echo "unexpected status: $status"
exit 1
fi
# ⑦
echo "updating apache config"
for i in `ls /etc/apache2/sites-available/*.conf.template`
do
target=`echo $i | grep -op '.*(?=\.template)'`
echo "writing file $target"
sudo sh -c "cat $i | adara_port=$port apache_log_dir=/var/log/apache2 envsubst > $target"
done
sudo systemctl reload apache2
# ⑧
echo "stopping old container..."
docker stop $old_container
docker rm $old_container
# ⑨
docker container rename $container_id adara-backend
echo "deployment done"
}
version=$1
if [ -z "$version" ]; then
version="latest"
fi
kill_old_pending
start_adara
- This is a utility function, which uses
netstat
to find a free port, starting at 8080 - If for some reason, a previous deployment failed, we might have an old container called
adara-pending
still running. If that’s the case, kill it. - Find the ID of the currently running container, and store it for later
- Find a free port, and start a new container on that port, under the temporary name
adara-pending
- Run
curl
with the--retry-connrefused --retry 45 --retry-delay 1
options to retrieve the actuator. This makes the script wait until the actuator is available and responding. - Grab the actual response, and use
jq
to get the status field. If the status isn’t ‘Up’ by the point, assume the deployment failed and exit with an error. - Update the Apache config files to point to the new port.
- Bring down the old container.
- Rename the new container from
adara-pending
toadara-backend
With that, the deployment is succesful, and we’re running the new container.
Apache config
Now, step 7 needs a bit more explanation. It uses envsubst
to do simple templating.
For each of my Apache configuration files, I have a .template
file, which is
just the config file, but instead of the concrete port, it references the
ADARA_PORT
environment variable. If we run this through envsubst
, those
values will be substituted, resulting in a valid config file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerName foobar.com
ServerAdmin info@foobar.com
CustomLog ${APACHE_LOG_DIR}/foobar.com.log combined
SSLCertificateFile /etc/letsencrypt/live/foobar.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/foobar.com/privkey.pem
Protocols h2 http/1.1
ProxyPreserveHost on
ProxyPass / http://localhost:$ADARA_PORT/
ProxyPassReverse / http://localhost:$ADARA_PORT/
</VirtualHost>
</IfModule>
Some thoughts
At this point, you may be thinking: “Why the hell don’t you just run k8s?”, and that’s a very valid question. At the moment, my needs are simply too small to justify running a cluster.
And yes, this took some manual work on my part, but the upside is that I know exactly how this solution works. It’s simple, predictable and straight-forward.
Plus: I learned a bunch of new things, and isn’t that what it’s ultimately all about?