Duplicity

Introduction

Duplicity is used to create a compressed, encrypted backups. Backups can be done to local storage or a remote server with support for a variety of back ends including SFTP, WebDAV and S3. It's straightforward to use on a server from the command line and there's a graphical desktop interface (Deja-dup) in Ubuntu.

It's a push system that uses GnuPG to encrypt the backups so that they are secure. By default it will use symmetric (traditional) encryption, but it can also be set to use a public key.

It creates a chain with a full backup and a set of incremental (delta) backups which means you can recover to the point in time when the incremental was taken. A downside of this is that you need the entire chain of full and incremental backups.

It can be used on Mac and there is a re-implementation available for Windows. At the simplest level you can store back-ups on any remote server that supports SFTP. Rsync.net is often mentioned as a backup specific hosting service to use it with.

Use-cases

Some common use-cases for Duplicity are:

  • Backing up to a local part of your system

There are lots of tools for local backup (e.g. rsync). The advantage duplicity has is that it's easy to tell it to do a full back-up and then create successive incremental backups. Some of the front-ends for it (e.g. Duply) will manage the backup policy for you.

  • Backing up to a remote public system

Anywhere where you fear that the backups might be accessible by someone else such as an Internet accessible server. With Duplicity encryption is part of the capabilities.

  • Backup when a system comes on-line

Transient systems can be told to back-up when they come on-line as it's a push system. Particularly useful for systems that are travelling a lot like laptops.

Strengths

  • Encrypts backups so they can be sent remotely to an untrusted server

Any system could be broken into, particularly those that are Internet accessible.

  • Uses the rsync protocol so efficently sends changes

It compares the backup that you already have against your source and then only backs-up any changes since last time (incremental back-up). This saves time and space making it easier to do backups regularly.

  • Compresses backups to save space

Particularly a concern if you're paying for space remotely.

  • Able to run unattended

It's a push system so you can script it for when a system comes on-line or to run at specific times of the day.

  • Uses standard Unix tools

Duplicity uses standard tools like tar, rsync and GnuPG so you can recover if there's a problem.

Weaknesses

  • Can't resume half-completed backups

It can't resume a half completed backup - so if you stop it you have to start from the beginning again. This is problematic if you need to do one large backup to a remote site but don't have enough bandwidth. The best solution is to use asynchronous-upload which splits creating the archive from uploading it to the remote server.

  • You can't restore over a currently existing file

Restoring files is a bit more complicated than is ideal. It won't restore a file over the currently existing file by default - you have to force it. While this is a bit fiddly it's probably a good thing.

  • You have to do full and and incremental backup

The way it builds up knowledge of changes is to backup changes from the last incremental backup that you did - so it builds up a chain over time. The problem with this is that if you do a lot of incremental backups and then run into a problem half-way through the chain you risk losing everything. The ideal situation would be to be able to condense backups with an 'differential' so it would be just the full backup and the changes from then (a differential). The take-away is you should do full backups pretty often to keep the chain of dependencies short.

Alternatives

There are a lot of backup tools out there. The ones that have stood the test of time and stability that I've looked at are:

  • Rsync

Rsync is best used for copying to another drive (for example a USB drive) or to a remote system. It's very efficient and easy to use for copying and synchronising. You can use other utilities for things like encryption. The algorithm is the basis for other systems such as Rdiff-backup and Duplicity.

  • Bacula

Bacula is a full back-up system designed for complex environments. If you have multiple different physical and virtual systems (i.e. Ubuntu, Windows, VMware) and have complex needs then this or something a bit simpler like Dirvish are good choices.

  • Rdiff-backup

Rdiff-backup is used for remote incremental backups or mirrors. It can be used across a local network or across the Net to another server. It is a push tool so the client kicks it off when it wants to start a backup. The main difference to Duplicity is that encryption is not integrated.

Installation

It's in the Ubuntu archive so it's as simple as:

$ sudo apt-get install duplicity

Basic usage

Initial backup

The basic command is:

$duplicity [source dir] [dest dir]

When run Duplicity detects whether there's a back-up chain in the destination directory, if there isn't it does a full of the files in [source dir] putting the archive in [dest dir]. When it's run after this it will be an incremental backup of whatever has changed since the last backup. The [dest dir] can be a url:

$ duplicity /home/steve file:///mnt/backup/200810

Following backups which specify the same collection [dest dir] will be incremental unless a full backup is forced with the 'full' command.

Full backup

This specifies that duplicity should do a full backup and ignore any other full or incremental backups that are at [dest url]:

$ duplicity full [source dir] [dest url]

This means you can have a full backup and a few incrementals stored at a location, and can then add another full back-up storing it in the same location.:

$ duplicity full /home/steve/ file:///mnt/backups

List files in backup

To list the files that are currently in the backup:

$ duplicity list-current-files <url>

Note that this is using the signature files so it's not actually testing whether the backup archive is in good shape - you use the verify command for that.

It only shows you the files in the last back-up - if that was a differential it won't show you a file that was backed-up in the last full back-up. To get around this you have to list all files in your collection by using the time option. For example, a full backup was done on the 1st May 2012, and a differential on the 7th May 2012 then to get a full manifest of files the commands would be

$ duplicity collection-status
$ duplicity list-current-files file:///mnt/201205-backup | tee /tmp/backupfile.lst
$ duplicity list-current-files --time 2012-05-01 | tee --append /tmp/backupfile.lst
$ grep /some/path/file-I-need /tmp/backupfile.lst

Note

Annoyingly you have to do this for each backup you did - so the full and each incremental.

Full restore

Tells duplicity to restore files from the remote url to the local url. It knows it's a restore because a local directory is before the remote directory:

$ duplicity [remote url] [source url]

Imagine that you need to restore your full home directory:

$ duplicity scp://sg@server.com//home/steve /home/steve

Restore a specific file

Note that you specify a file to name the restored file as if you're restoring a specific file. You can also specify an entire directory and it will restore to the directory that you specify in destination url.

Duplicity automatically enters restore mode if the remote collection location is ahead of the local location:

$ duplicity --file-to-restore [relative path and file]  [source collection url] [destination location]

For example, if we want to restore the file 'Projects/specific-file.txt' we would do the following:

$ duplicity --file-to-restore Projects/specific-file.txt  file:///mnt/20080507-backup /home/steve/tmp/backup-file-txt

Restore from a specific time

If you are doing incremental backups every day then you can restore an earlier version of a file as follows:

$ duplicity <time> <file> [remote url] [source url]

For example, lets say we have a document that we've been altering but we want the one from 3 days previously:

$ sudo duplicity -t 3D --file-to-restore /home/sg/docs/article1.rst scp://sg@server.com//home/steve home/sg/tmp/restored-files

See the TIME formats section of the manual for ways to specify time.

With --file-to-restore you have to give the path relative to the root of the directory backed-up.

In some cases you want to know the versions of the file that are available to restore. First, you have to find the versions of the file that are available to be restored. The easiest way is to list all the files and grep for the versions of it:

$ sudo dupliciy list-current-files --time 3D file://mnt/backups >tmp/file.list
$ sudo duplicity list-current-files file::///mnt/backups >/tmp/file.list
$ grep file-to-restore.rst /tmp/file.list

Note

A new version of duplicity has a --file-changed command which supports this a lot better.

Verify a backup

To check that a backup has completed correctly and that the archive is in order you use the verify command. It compares what's in the backup at [remote url] with the current backup contents at [source url]:

$ duplicity verify --verbosity 9 [remote url] [source url]

An easy example:

$ duplicity verify --verbosity 9 file::///mnt/backups/proj1 /home/user/proj1

You can also specify that you would like to verify a specific file with the --file-to-restore option:

$ sudo duplicity verify --file-to-restore /home/sg/file.txt --verbosity 9 [remote collection url] [folder]

Check collection status

You can check the collection status at a specific backup location, including how many full and incremental backups have been done:

$ duplicity collection-status [collection url]

For example:

$ duplicity collection-status --verbosity 9 file:///mnt/20080507-backup

Remove old incremental backups

Commonly you will have full backups on a regular basis and incremental backups between: for example full weekly and incremental daily. To clean-up space in your backup collection you can remove old incremental backups:

$ duplicity remove-all-inc-of-but-n-full <count> [collection url]

For example, using our scheme above we could remove all incremental backups over a four weeks old, while leaving the weekly full backups in place with:

$ sudo duplicity remove-all-inc-of-but-n-full 4 file:///mnt/2015backups/

A value of 1 means keep only the most recent backup chain (the last full and any incrementals).

Remove old backups

Over time your backups location will have a set of full backups, and incrementals from each. These will take up a lot of space so you'll want to prune the old ones. You do this with:

$ sudo duplicity remove-all-but-n-full <count> [collection url]

Imagine that your collection has weekly full backups and then incrementals every day. You want to remove backups that are older than 12 weeks:

$ sudo duplicity remove-all-but-n-full 12 file:///mnt/2015backups

Useful options

This is not a complete list, but the most useful ones are:

--asynchronous-upload
Asynchronously upload which is good for slow connections. Note that you will need a large temp file location (see --tempdir).
--dry-run
Calculate what would be done but don't perform any back end actions.
--encrypt-key <key-id>
Use the specified public key to encrypt with, otherwise it will use standard symmetric encryption. Note that you can specify multiple encryption keys in any format supported by GnuPG.
--exclude <shell_pattern>
Exclude the file or files matched by the pattern. A file can also be specified for exclude patterns with --exclude-filelist
--exclude-filelist <file>
Exclude the files in the specified file. Note that it will not use regular expressions in this file.
--exclude-globbing-filelist <file>
Use a regexp using the same rules as for include and exclude. See notes below.
--exclude-other-filesystems
Exclude files from other file systems.
--full-if-older-than <time>
Do a full backup (even if an incremental is requested) if the last full backup is older than <time>.
--file-to-restore <path>
Provides the file to restored rather than the whole tree. The path is relative to the root directory that was backed up.
--gpg-options <"option">
Allows you to send GnuPG options, for example to strengthen the level of encryption you could use "--cipher-algo=AES256"
--name symbolicname
Sets the name of the backup. If you are running more than one distinct backup you are encouraged to use this option.
--no-encryption
Don't use GnuPG encryption, just send gzipped archives.
--progress
Output progress on upload and estimated upload time.
--sign-key <key-id>
Sign the encrypted volumes with the key-id specified, this is the numeric id of the key 682E675C.
--tempdir <directory>
Use the specific temp directory.
--volsize [number]
The default volume size is 5MB. You can set it to whatever you want, 100MB is reasonable. It needs approximately twice the volsize in /tmp to do the backup.
--verbosity [0-9]
Increase the verbosity of the output. Level 4 (notice) is the default, 8 (info) is useful and 9 is debug.

Issues

  • How do I get more temporary space?

When uploading with the asynchronous option, or verifying a backup a lot of temporary space can be used. The easiest solution is to use the --tempdir option to specify an alternative location to use for temporary files.

  • How do I specify times?

When restoring files you can specify a time. The most useful options are:

now

intervals which are pretty naive

  • h - hours
  • D - days
  • W - weeks
  • M - months

YYYY/MM/DD or YYYY-MM-DD

If it's anything more complex you can use the datetime format, refer to the man page. For example, to remove backups older than 1 months you would do:

duplicity remove-older-than 1M /mnt/backups

If you wanted to restore a file from a back-up done on March 12th 2015 you could do:

duplicity restore --file-to-restore path/to/file.txt --restore-time 2015-03-12 /mnt/backups
  • How can I see which files have been added or removed from one back-up to the next?

Listing files shows what's backed up, but it's nice to see what files have been removed/added from one back-up to the next. The best way I can see is:

$ sudo duplicity --list-current-files --time 2015-07-28 file:///mnt/backups/2015duplicity | tee 20150728backupfiles.lst
$ sudo duplicity --list-current-files --time 2015-08-20 file:///mnt/backups/2015duplicity | tee 20150802backupfiles.lst
$ grep --fixed-strings --line-regexp --invert-match --file 20150728backupfiles.lst 20150802backupfiles.lst

This finds all the files that are in the second later backup that are not in the first one. As long as they are both full then it's fine as it's a like for like comparison.

Note

Treat this with caution as I don't have two complete backups to test this against.

Common configuration

Ignoring files or directories

Many uses of Duplicity involve storing the backed up files remotely. It makes sense to reduce the size of the backup as much as possible. There are a variety of options you can use to select or exclude files. When Duplicity runs it searches through the source directory and backs up all the files it finds using the file selection rules.

Each file is run through the rules and as soon as a match is found it stops looking. The system defaults to including any file in the source tree it's been told to search, unless an exclusion condition is found. If a path or file is matched by a rule then it stops looking - first match is important. For example:

+ /usr/local
- /usr/local/bin

This would back-up everything in /usr/local including /usr/local/bin because as Duplicity ran through the list of rules for the /usr/local/bin/ directory it hit the addition of /usr/local first.

The take-away is you should define specific rules first for including/excluding directories, and then more general rules. I find the easiest way to think of it is as walking from deep in the directory hierarchy backwards.

Most of the options accept extended shell globbing patterns:

*   Match any string not containing /
?   Expands to any character except /
[...]   Expands to a single character of those characters specified
**  Expands to any string of characters whether or not it contains /
ignorecase: if the string starts with this it is case insensitive

Practically, it is easiest to use the exclude-globbing-filelist. To create an exclude globbing file there are three steps:

  1. Exclude the specific files or directories you don't want
  2. Include the specific directories that you want
  3. Exclude everything else

As an example, if you wanted to back-up your documents folder but not the Dropbox subfolder you'd do:

- **Documents/Dropbox
+ Documents
- **

The final wrinkle is that if a rule matches then it also matches all the directories in the path. For example:

--include /usr/local/myproject

This includes the /usr/local/myproject, but also /usr/local/ and any subdirectories of it will be included.

Note

the contents of the globbing-include or globbing-exclude file are very sensitive as they are read line by line for the rules. You cannot have any comments in the file or unnecessary spaces/tabs that will confuse the rules. In vim do :set list to see all white space.

If you have a lot of file rules it can be complex to see what the impact of your rule changes is. By running a back-up and then keeping a copy of the files you can check what has changed with:

grep -F -x -v -f duplicity-run2.log duplicity-run1.log

Advanced configuration

Integration with LVM

Running a full back-up of an active system can be difficult as the files are changing as you're doing the backup. LVM snapshots are a great way to get around this as you can freeze a snapshot of the system at time N. It also lets you fully verify the backup after it is completed as none of the files in the LVM snapshot will change.

Assuming you have LVM up and running, an example of using it is:

$ sudo lvcreate -L10G -s -n homesnapshot /dev/vgUbuntu/home-ubuntu-lv

Creates a snapshot of /dev/vgHome/home-ubuntu-lv with up to 10G of space for differences.

Then mount the snapshot as read-only so that it can be backed-up. The purpose of doing it read-only is so that changes aren't made accidentally, that way you can verify the back-up later:

$ sudo mount -o ro /dev/vgHome/homesnapshot /mnt/homesnapshot

Now to run the back-up, taking files from the snapshot:

$ cd /mnt/homesnapshot
$ sudo script -c "sudo nice -n 20 duplicity incremental --full-if-older-than 30D --exclude-globbing-filelist /home/steve/share/backupexclude.txt \
--volsize 200 --encrypt-key 682E675C --name localdrivebackup --asynchronously-upload --verbosity 8 /mnt/homesnapshot/ file:///mnt/backups/" /tmp/backup.log

For the inner duplicity command the main options are that we're telling it to do an incremental unless the last full back-up is over 30 days old (in which case it does a full back-up). We also tell it to exclude specific files, use GnuPG for encryption, name the specific backup (because we have more than one) and do the back-up upload asynchronously. The outer script -c part allows us to see the command on the terminal but also stores it in /tmp/backup.log for future checking.

After the back-up has completed we can verify that it has worked with:

$ script -c "sudo nice -n 20 duplicity verify --tempdir /mnt/tmpduplicity/ --exclude-globbing-filelist /home/steve/share/backupexclude.txt \
--verbosity 8 file:///mnt/backups/ /mnt/homesnapshot" /tmp/backupverify.log

In my particular case I often have to add some temporary space to allow Duplicity to verify the back-up so use the --tempdir option. The last step is to remove the LVM snapshot:

$ cd
$ sudo umount /dev/vgUbuntu/homesnapshot
$ sudo lvremove /dev/vgUbuntu/homesnapshot
$ sudo lvs

See: LVM Snapshots with Duplicity

Resources

Comments

If you have comments or questions about Duplicity please comment away!

blog comments powered by Disqus