Duplicity is used to create a compressed, encrypted backups. Backups can be done to local storage or a remote server with support for a variety of back ends including SFTP, WebDAV and S3. It's straightforward to use on a server from the command line and there's a graphical desktop interface (Deja-dup) in Ubuntu.
It's a push system that uses GnuPG to encrypt the backups so that they are secure. By default it will use symmetric (traditional) encryption, but it can also be set to use a public key.
It creates a chain with a full backup and a set of incremental (delta) backups which means you can recover to the point in time when the incremental was taken. A downside of this is that you need the entire chain of full and incremental backups.
It can be used on Mac and there is a re-implementation available for Windows. At the simplest level you can store back-ups on any remote server that supports SFTP. Rsync.net is often mentioned as a backup specific hosting service to use it with.
Some common use-cases for Duplicity are:
There are lots of tools for local backup (e.g. rsync). The advantage duplicity has is that it's easy to tell it to do a full back-up and then create successive incremental backups. Some of the front-ends for it (e.g. Duply) will manage the backup policy for you.
Anywhere where you fear that the backups might be accessible by someone else such as an Internet accessible server. With Duplicity encryption is part of the capabilities.
Transient systems can be told to back-up when they come on-line as it's a push system. Particularly useful for systems that are travelling a lot like laptops.
Any system could be broken into, particularly those that are Internet accessible.
It compares the backup that you already have against your source and then only backs-up any changes since last time (incremental back-up). This saves time and space making it easier to do backups regularly.
Particularly a concern if you're paying for space remotely.
It's a push system so you can script it for when a system comes on-line or to run at specific times of the day.
Duplicity uses standard tools like tar, rsync and GnuPG so you can recover if there's a problem.
It can't resume a half completed backup - so if you stop it you have to start from the beginning again. This is problematic if you need to do one large backup to a remote site but don't have enough bandwidth. The best solution is to use asynchronous-upload which splits creating the archive from uploading it to the remote server.
Restoring files is a bit more complicated than is ideal. It won't restore a file over the currently existing file by default - you have to force it. While this is a bit fiddly it's probably a good thing.
The way it builds up knowledge of changes is to backup changes from the last incremental backup that you did - so it builds up a chain over time. The problem with this is that if you do a lot of incremental backups and then run into a problem half-way through the chain you risk losing everything. The ideal situation would be to be able to condense backups with an 'differential' so it would be just the full backup and the changes from then (a differential). The take-away is you should do full backups pretty often to keep the chain of dependencies short.
There are a lot of backup tools out there. The ones that have stood the test of time and stability that I've looked at are:
Rsync is best used for copying to another drive (for example a USB drive) or to a remote system. It's very efficient and easy to use for copying and synchronising. You can use other utilities for things like encryption. The algorithm is the basis for other systems such as Rdiff-backup and Duplicity.
Bacula is a full back-up system designed for complex environments. If you have multiple different physical and virtual systems (i.e. Ubuntu, Windows, VMware) and have complex needs then this or something a bit simpler like Dirvish are good choices.
Rdiff-backup is used for remote incremental backups or mirrors. It can be used across a local network or across the Net to another server. It is a push tool so the client kicks it off when it wants to start a backup. The main difference to Duplicity is that encryption is not integrated.
It's in the Ubuntu archive so it's as simple as:
$ sudo apt-get install duplicity
The basic command is:
$duplicity [source dir] [dest dir]
When run Duplicity detects whether there's a back-up chain in the destination directory, if there isn't it does a full of the files in [source dir] putting the archive in [dest dir]. When it's run after this it will be an incremental backup of whatever has changed since the last backup. The [dest dir] can be a url:
$ duplicity /home/steve file:///mnt/backup/200810
Following backups which specify the same collection [dest dir] will be incremental unless a full backup is forced with the 'full' command.
This specifies that duplicity should do a full backup and ignore any other full or incremental backups that are at [dest url]:
$ duplicity full [source dir] [dest url]
This means you can have a full backup and a few incrementals stored at a location, and can then add another full back-up storing it in the same location.:
$ duplicity full /home/steve/ file:///mnt/backups
To list the files that are currently in the backup:
$ duplicity list-current-files <url>
Note that this is using the signature files so it's not actually testing whether the backup archive is in good shape - you use the verify command for that.
It only shows you the files in the last back-up - if that was a differential it won't show you a file that was backed-up in the last full back-up. To get around this you have to list all files in your collection by using the time option. For example, a full backup was done on the 1st May 2012, and a differential on the 7th May 2012 then to get a full manifest of files the commands would be
$ duplicity collection-status $ duplicity list-current-files file:///mnt/201205-backup | tee /tmp/backupfile.lst $ duplicity list-current-files --time 2012-05-01 | tee --append /tmp/backupfile.lst $ grep /some/path/file-I-need /tmp/backupfile.lst
Annoyingly you have to do this for each backup you did - so the full and each incremental.
Tells duplicity to restore files from the remote url to the local url. It knows it's a restore because a local directory is before the remote directory:
$ duplicity [remote url] [source url]
Imagine that you need to restore your full home directory:
$ duplicity scp://email@example.com//home/steve /home/steve
Note that you specify a file to name the restored file as if you're restoring a specific file. You can also specify an entire directory and it will restore to the directory that you specify in destination url.
Duplicity automatically enters restore mode if the remote collection location is ahead of the local location:
$ duplicity --file-to-restore [relative path and file] [source collection url] [destination location]
For example, if we want to restore the file 'Projects/specific-file.txt' we would do the following:
$ duplicity --file-to-restore Projects/specific-file.txt file:///mnt/20080507-backup /home/steve/tmp/backup-file-txt
If you are doing incremental backups every day then you can restore an earlier version of a file as follows:
$ duplicity <time> <file> [remote url] [source url]
For example, lets say we have a document that we've been altering but we want the one from 3 days previously:
$ sudo duplicity -t 3D --file-to-restore /home/sg/docs/article1.rst scp://firstname.lastname@example.org//home/steve home/sg/tmp/restored-files
See the TIME formats section of the manual for ways to specify time.
With --file-to-restore you have to give the path relative to the root of the directory backed-up.
In some cases you want to know the versions of the file that are available to restore. First, you have to find the versions of the file that are available to be restored. The easiest way is to list all the files and grep for the versions of it:
$ sudo dupliciy list-current-files --time 3D file://mnt/backups >tmp/file.list $ sudo duplicity list-current-files file::///mnt/backups >/tmp/file.list $ grep file-to-restore.rst /tmp/file.list
A new version of duplicity has a --file-changed command which supports this a lot better.
To check that a backup has completed correctly and that the archive is in order you use the verify command. It compares what's in the backup at [remote url] with the current backup contents at [source url]:
$ duplicity verify --verbosity 9 [remote url] [source url]
An easy example:
$ duplicity verify --verbosity 9 file::///mnt/backups/proj1 /home/user/proj1
You can also specify that you would like to verify a specific file with the --file-to-restore option:
$ sudo duplicity verify --file-to-restore /home/sg/file.txt --verbosity 9 [remote collection url] [folder]
You can check the collection status at a specific backup location, including how many full and incremental backups have been done:
$ duplicity collection-status [collection url]
$ duplicity collection-status --verbosity 9 file:///mnt/20080507-backup
Commonly you will have full backups on a regular basis and incremental backups between: for example full weekly and incremental daily. To clean-up space in your backup collection you can remove old incremental backups:
$ duplicity remove-all-inc-of-but-n-full <count> [collection url]
For example, using our scheme above we could remove all incremental backups over a four weeks old, while leaving the weekly full backups in place with:
$ sudo duplicity remove-all-inc-of-but-n-full 4 file:///mnt/2015backups/
A value of 1 means keep only the most recent backup chain (the last full and any incrementals).
Over time your backups location will have a set of full backups, and incrementals from each. These will take up a lot of space so you'll want to prune the old ones. You do this with:
$ sudo duplicity remove-all-but-n-full <count> [collection url]
Imagine that your collection has weekly full backups and then incrementals every day. You want to remove backups that are older than 12 weeks:
$ sudo duplicity remove-all-but-n-full 12 file:///mnt/2015backups
This is not a complete list, but the most useful ones are:
When uploading with the asynchronous option, or verifying a backup a lot of temporary space can be used. The easiest solution is to use the --tempdir option to specify an alternative location to use for temporary files.
When restoring files you can specify a time. The most useful options are:
intervals which are pretty naive
- h - hours
- D - days
- W - weeks
- M - months
YYYY/MM/DD or YYYY-MM-DD
If it's anything more complex you can use the datetime format, refer to the man page. For example, to remove backups older than 1 months you would do:
duplicity remove-older-than 1M /mnt/backups
If you wanted to restore a file from a back-up done on March 12th 2015 you could do:
duplicity restore --file-to-restore path/to/file.txt --restore-time 2015-03-12 /mnt/backups
Listing files shows what's backed up, but it's nice to see what files have been removed/added from one back-up to the next. The best way I can see is:
$ sudo duplicity --list-current-files --time 2015-07-28 file:///mnt/backups/2015duplicity | tee 20150728backupfiles.lst $ sudo duplicity --list-current-files --time 2015-08-20 file:///mnt/backups/2015duplicity | tee 20150802backupfiles.lst $ grep --fixed-strings --line-regexp --invert-match --file 20150728backupfiles.lst 20150802backupfiles.lst
This finds all the files that are in the second later backup that are not in the first one. As long as they are both full then it's fine as it's a like for like comparison.
Treat this with caution as I don't have two complete backups to test this against.
Many uses of Duplicity involve storing the backed up files remotely. It makes sense to reduce the size of the backup as much as possible. There are a variety of options you can use to select or exclude files. When Duplicity runs it searches through the source directory and backs up all the files it finds using the file selection rules.
Each file is run through the rules and as soon as a match is found it stops looking. The system defaults to including any file in the source tree it's been told to search, unless an exclusion condition is found. If a path or file is matched by a rule then it stops looking - first match is important. For example:
+ /usr/local - /usr/local/bin
This would back-up everything in /usr/local including /usr/local/bin because as Duplicity ran through the list of rules for the /usr/local/bin/ directory it hit the addition of /usr/local first.
The take-away is you should define specific rules first for including/excluding directories, and then more general rules. I find the easiest way to think of it is as walking from deep in the directory hierarchy backwards.
Most of the options accept extended shell globbing patterns:
* Match any string not containing / ? Expands to any character except / [...] Expands to a single character of those characters specified ** Expands to any string of characters whether or not it contains / ignorecase: if the string starts with this it is case insensitive
Practically, it is easiest to use the exclude-globbing-filelist. To create an exclude globbing file there are three steps:
As an example, if you wanted to back-up your documents folder but not the Dropbox subfolder you'd do:
- **Documents/Dropbox + Documents - **
The final wrinkle is that if a rule matches then it also matches all the directories in the path. For example:
This includes the /usr/local/myproject, but also /usr/local/ and any subdirectories of it will be included.
the contents of the globbing-include or globbing-exclude file are very sensitive as they are read line by line for the rules. You cannot have any comments in the file or unnecessary spaces/tabs that will confuse the rules. In vim do :set list to see all white space.
If you have a lot of file rules it can be complex to see what the impact of your rule changes is. By running a back-up and then keeping a copy of the files you can check what has changed with:
grep -F -x -v -f duplicity-run2.log duplicity-run1.log
Running a full back-up of an active system can be difficult as the files are changing as you're doing the backup. LVM snapshots are a great way to get around this as you can freeze a snapshot of the system at time N. It also lets you fully verify the backup after it is completed as none of the files in the LVM snapshot will change.
Assuming you have LVM up and running, an example of using it is:
$ sudo lvcreate -L10G -s -n homesnapshot /dev/vgUbuntu/home-ubuntu-lv
Creates a snapshot of /dev/vgHome/home-ubuntu-lv with up to 10G of space for differences.
Then mount the snapshot as read-only so that it can be backed-up. The purpose of doing it read-only is so that changes aren't made accidentally, that way you can verify the back-up later:
$ sudo mount -o ro /dev/vgHome/homesnapshot /mnt/homesnapshot
Now to run the back-up, taking files from the snapshot:
$ cd /mnt/homesnapshot $ sudo script -c "sudo nice -n 20 duplicity incremental --full-if-older-than 30D --exclude-globbing-filelist /home/steve/share/backupexclude.txt \ --volsize 200 --encrypt-key 682E675C --name localdrivebackup --asynchronously-upload --verbosity 8 /mnt/homesnapshot/ file:///mnt/backups/" /tmp/backup.log
For the inner duplicity command the main options are that we're telling it to do an incremental unless the last full back-up is over 30 days old (in which case it does a full back-up). We also tell it to exclude specific files, use GnuPG for encryption, name the specific backup (because we have more than one) and do the back-up upload asynchronously. The outer script -c part allows us to see the command on the terminal but also stores it in /tmp/backup.log for future checking.
After the back-up has completed we can verify that it has worked with:
$ script -c "sudo nice -n 20 duplicity verify --tempdir /mnt/tmpduplicity/ --exclude-globbing-filelist /home/steve/share/backupexclude.txt \ --verbosity 8 file:///mnt/backups/ /mnt/homesnapshot" /tmp/backupverify.log
In my particular case I often have to add some temporary space to allow Duplicity to verify the back-up so use the --tempdir option. The last step is to remove the LVM snapshot:
$ cd $ sudo umount /dev/vgUbuntu/homesnapshot $ sudo lvremove /dev/vgUbuntu/homesnapshot $ sudo lvs
Easy introduction to using Duplicity. Not particularly Ubuntu centric.
Article by Joe Brockmeier, covers using duplicity with S3 or SSH to a remote server.
Windows version of Duplicity written in .NET 2.0
Backing-up in a zero trust environment. Ensuring the client or the server remain secure if either is compromised. Very interesting ideas.
Write-up on S3, the author also covers back-up policy.
Two-part series on using Duplicity in a network set-up. Check that all the detail is up to date as it's from 2005: recommendations on SSH keys have probably changed since then.