Skip to content

Dev Report: Declarative Backups and Restore

Our goal with Clan is to give users control over their data. However, with great power comes great responsibility, and owning your data means you also need to take care of backups yourself.

In our experience, setting up automatic backups is often a tedious process as it requires custom integration of the backup software and the services that produce the state. More important than the backup is the restore. Restores are often not well tested or documented, and if not working correctly, they can render the backup useless.

In Clan, we want to make backup and restore a first-class citizen. Every service should describe what state it produces and how it can be backed up and restored.

In this article, we will discuss how our backup interface in Clan works. The interface allows different backup software to be used interchangeably and allows module authors to define custom backup and restore logic for their services.

First Comes the State

Our services are built from Clan modules. Clan modules are essentially NixOS modules, the basic configuration components of NixOS. However, we have enhanced them with additional features provided by Clan and restricted certain option types to enable configuration through a graphical interface.

In a simple case, this can be just a bunch of directories, such as what we define for our ZeroTier VPN service.

  clan.core.state.zerotier.folders =  [ "/var/lib/zerotier-one" ];

For other systems, we need more complex backup and restore logic. For each state, we can provide custom command hooks for backing up and restoring.

In our PostgreSQL module, for example, we define preBackupCommand and postRestoreCommand to use pg_dump and pg_restore to backup and restore individual databases:

preBackupCommand = ''
  # ...
  runuser -u postgres -- pg_dump ${compression} --dbname=${} -Fc -c > "${current}.tmp"
  # ...
postRestoreCommand = ''
  # ...
  runuser -u postgres -- dropdb "${}"
  runuser -u postgres -- pg_restore -C -d postgres "${current}"
  # ...

Then the Backup

Our CLI unifies the different backup providers in one interface.

As of now, we support backups using BorgBackup and a backup module called "localbackup" based on rsnapshot, optimized for backup on locally attached storage media.

To use different backup software, a module needs to set the options provided by our backup interface. The following Nix code is a toy example that uses the tar program to perform backup and restore to illustrate how the backup interface works:

  clan.core.backups.providers.tar = {
    list = ''
      echo /var/lib/system-back.tar
    create = let
      uniqueFolders = lib.unique (
        lib.flatten (lib.mapAttrsToList (_name: state: state.folders) config.clan.core.state)
    in ''
      # FIXME: a proper implementation should also run `state.preBackupCommand` of each state
      if [ -f /var/lib/system-back.tar ]; then
        tar -uvpf /var/lib/system-back.tar ${builtins.toString uniqueFolders}
        tar -cvpf /var/lib/system-back.tar ${builtins.toString uniqueFolders}
    restore = ''
      IFS=':' read -ra FOLDER <<< "''$FOLDERS"
      echo "${FOLDER[@]}" > /run/folders-to-restore.txt
      tar -xvpf /var/lib/system-back.tar -C / -T /run/folders-to-restore.txt

For better real-world implementations, check out the implementations for BorgBackup and localbackup.

What It Looks Like to the End User

After following the guide for configuring a backup, users can use the CLI to create backups, list them, and restore them.

Backups can be created through the CLI like this:

clan backups create web01

BorgBackup will also create backups itself every day by default.

Completed backups can be listed like this:

clan backups list web01
One cool feature of our backup system is that it is aware of individual services/applications. Let's say we want to restore the state of our Matrix chat server; we can just specify it like this:

clan backups restore --service matrix-synapse web01 borgbackup

In this case, it will first stop the matrix-synapse systemd service, then delete the PostgreSQL database, restore the database from the backup, and then start the matrix-synapse service again.

Future work

As of now we implemented our backup and restore for a handful of services and we expect to refine the interface as we test the interface for more applications.

Currently, our backup implementation backs up filesystem state from running services. This can lead to inconsistencies if applications change the state while the backup is running. In the future, we hope to make backups more atomic by backing up a filesystem snapshot instead of normal directories. This, however, requires the use of modern filesystems that support these features.