git as a general purpose backup utility Monday 3rd December 2007

When it was first suggested to me that you could just use git for backup I was not convinced. You would have these massive .git directories in high level places on your filesystem for one.

Now that I’ve had some time to reflect on the possibility I think that perhaps it isn’t such a crazy idea. It’s not actually true that you have to have a .git directory in the place that you want to back up. In fact, I am even trialling git alongside by regular “tar” based backup.

Here’s what I do. Suppose, for the sake of example, that I’m going to backup /home onto a separate backup partition called /backup.

Step 1 – Create a git repository for the backup

mkdir /backup/home.git
git --git-dir=/backup/home.git --work-tree=/home init

[
I used to do this as follows before I discovered about the –work-tree option to git. It has the same effect.

git --bare init
git config core.bare false
git config core.worktree /home

]

Step 2 – Initial backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -m "Initial /home backup"

Step 3 – Copy backups to a safe remote machine
Assuming that you have a second machine where you want to store your backups to which you have ssh access (and has git installed), you can initialize a new empty git repository for this purpose.
Suppose that this machine is called other-machine and the repository is located at /backup/first-machine/home.git.

The initial remote backup is performed thus.

cd /backup/home.git
git remote add other-machine ssh://other-machine/backup/first-machine/home.git
git gc
git push other-machine master

The git gc seems fairly important. At this stage you have a massive git repository that hasn’t yet been packed. When you attempt to push it, git will want to perform a big “Deltifying” step to create a pack on the remote side. If you perform the git gc on the local side first it will perform the big “Deltifying” step and effectively store the results as a pack on the local side. The git push can use this and, having done the gc, subsequent local operations can also take advantage of the local pack whereas just letting the push do the pack would lose the work done from the local side.

Step 4 – Incremental backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -a -m "Initial /home backup"

Performing both an “add” and a “commit -a” looks repetitive but is required as “commit -a” does not add new untracked files and “add” doesn’t ‘add’ file deletions to the index.

Step 5 – Push incremental backup to remote machine

cd /backup/home.git
git push other-machine master

Well, that was easy.

Disadvantages
The initial “git gc” step can be very slow.
git does not store owner/group information or atime and utime information. The backup is content only.
“git add .” is not robust against files that disappear while git’s looking at them (e.g. lock files). It tends to fail with a “cannot stat” message when you really want it to not bother with that file and carry on.

Comments are closed.