Backing up self-hosted Docker environments

Like many geeks, I've got a small home server setup that I use to host various services and information that I'd rather not farm out to a random company because the data is sensitive (password manager), expensive (file sharing), or just plain not something that's available for commercial purchase (media hosting).

I've also had bad experiences with a tool being integrated into my daily workflows, and then canceled by the company/the company went out of business/got turned into an expensive commercial product with yet another endless subscription, so I also like hosting as much as is feasible myself so that the absolute worst case is that I have software that I use that never gets updates, but gives you a very, very long time to find a good replacement. And, of course, supporting and using open-source software over a proprietary option is always the way to go, if you have the ability to do so.

My software stack has gotten more and more complex over the past several months as I've added new and different applications to fill different needs (and software to monitor the software to make sure it's all doing what it's supposed to), as well as for doing things that I wasn't aware I could do: a recent example is Bazarr, which can download subtitles for all your media and make a much more pleasant user experience if you need or just prefer to have subtitles enabled.

One of the drawbacks of adding more and more software is that I've invested more and more time into configuring all these pieces of software (which, at current count, stands at about 30 stacks comprised of about 40 containers) as well as any UI or other customization needed to make them feel more my own.

Clearly, a data loss would be a major pain at best and a complete catastrophe at worst and backing up Docker volumes and data it isn't always clear as to what you need, where it is, and how to grab the data you're most concerned about.

My Docker configuration is slightly unusual in that I've stacked all the Docker volumes that were mounted for the images into a single filesystem tree so that I can simply back up the entire file structure and end up with a comprehensive and complete backup.

Docker itself does create volumes like this, but the naming scheme and location typically are not super clear and owned by root, making access to it slightly more problematic for non-privileged users.  

Essentially the file tree looks similar to:

/home
   -> /application1
      -> /config
      -> /data
   -> /application2
      -> /config
      -> /data
....

This layout puts all the data in a clear and easily understood layout, but how do you make Docker put the data there?

That's actually pretty easy; I use Portainer and deploy everything as a Docker compose stack so that I have a file that clearly outlines how I've configured the container and makes it easier to edit that configuration and redeploy if I need to do so. It also makes updating software easier, since you can either just pull the new image via Portainer or stop and restart the stack via docker-compose.  

Portainer isn't required for this type of volume layout; Docker compose files can do this on their own, I just enjoy some of the added features and usability Portainer provides.

The only modification which is needed is made to the compose file, which many maintainers provide for ease of deployment. You simply need to adjust the volumes section:

    volumes:
      - /local/filesystem/path/config:/config
      - /local/filesystem/path/data:/data

The left of the colon is the filesystem path on the host, ex. /home/username, and to the right is the path as mounted inside the Docker container. Usually, the right side path is already provided by the package maintainer if they provide the Docker compose file or is obtainable from the documentation.

The left can be anywhere on the host that you'd like, but I prefer the nested structure so I don't have to remember where I configured a particular piece of software to store its data.

All you need to back up the directory is to simply make a tarball of the whole filesystem path and then stick the archive file somewhere safe. Restoring your backup is simply doing it in the opposite direction: downloading and unarchiving the file.  I also backup the Portainer configuration file, as that includes all the Docker compose files and any other information I'd need to pull new images and redeploy on the Docker side.

I've tested restoring from this backup method and it took about 10 minutes to download the archive and untar it, and then about an additional 10 minutes for Portainer to restore from its backup and pull all the software images. The end result was everything was back up and in the exact same state it was when the backup archive was made, and in that state relatively quickly for a bare-metal restore.

And, before you send me an angry email, I'm aware this method has a few flaws, one of which is that it DOES NOT guarantee a good backup of any RDBMS (Postgres, MySQL, MariaDB) that are in the containers, but my personal stack doesn't require or use any and just relies on SQLite databases which, typically, can be copied into an archive safely with limited risk of data loss or inconsistent data.

It's not 100% guaranteed, and I do make dumps of my Bitwarden self-hosted instance before backups as that data is critical, but for the remainder of the containers, if one of the daily backups has an issue and I have to go back a day or few to find an archive that's not got issues, it's not a major issue.  

I've automated this process via a cronjob that makes the archive and then uses rclone to back it up to Google Drive. The full backup size is roughly 2.5 GB and the whole process takes about 15 minutes and runs once a night.

I certainly sleep much better at night knowing I have a relatively current state of the home server stored off-site and that in the case of a catastrophic failure, the step that's going to take the longest is going to be replacing failed hardware and installing Linux, not restoring all the applications and data.