UPDATE Oct 2019: since the recent v7 repository version of git annex, the problem (and solution) described below are not that relevant anymore. See the largefiles feature for example.
For managing my photo (and video) collection, which is too large to fit on my laptop drive, I use Git Annex. It's a nerdy solution (a fair amount of git knowledge is required), but I like that I can sync a whole tree of files between multiple devices/backends without requiring that all content is present everywhere. For example: my repo covers more than 300GB in pictures and videos in total, but only 14GB of that is present on my laptop's disk at the moment.
Git annex add
To add files to a git annex repo, you have to use git annex add $filename
on the command line. You have to be careful not to forget the annex
part there. If you forget it (not unlikely if git add
is baked in your muscle memory),
you'll store the content in the git repo, instead of the git-annex extension of it.
This means that this content will be recorded in the git history and
will end up on all clones, even if you remove it in later commits.
In contrast, git annex does its magic by storing the content in a different place
than the normal git repo, and only storing symlinks in the git repo.
Bottom line: if you accidentally use git add
instead of git annex add
you ruin the whole point of using git annex, and it is very hard
to fix such a mistake if you discover it too late.
Foolproofing myself with a git hook
To prevent myself from making such mistakes, I set up a git pre-commit
hook.
In my case it's not too complex, because I only store "big files"
to be git annex
'ed in my repo. I just have check that I
commit symlinks and no real files.
This is my .git/hooks/pre-commit
:
#!/bin/sh
# automatically configured by git-annex
git annex pre-commit .
###############################################################
# Prevent that real files are committed, only accept symlinks.
###############################################################
# Standard git pre-commit stuff to find what to take a diff against.
if git rev-parse --verify HEAD >/dev/null 2>&1
then
against=HEAD
else
# Initial commit: diff against an empty tree object
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904
fi
# Go though added files (--diff-filter=A) and check whether they are symlinks (test -h).
# To handle file names with spaces and possibly other weird characters, we use
# this funky "-z while IFS read" construct.
git diff --cached --name-only --diff-filter=A -z $against | while IFS= read -r -d '' file; do
if test ! -h "$file"; then
echo "Aborting commit: for this git-annex repo we only want symlinks and this file is not: $file" >&2
exit 1
fi
done
Note: the git annex pre-commit .
part was the original hook implementation,
added by git annex init
, which I kept of course.