UPDATE Oct 2019: since the recent v7 repository version of git annex, the problem (and solution) described below are not that relevant anymore. See the largefiles feature for example.

For managing my photo (and video) collection, which is too large to fit on my laptop drive, I use Git Annex. It's a nerdy solution (a fair amount of git knowledge is required), but I like that I can sync a whole tree of files between multiple devices/backends without requiring that all content is present everywhere. For example: my repo covers more than 300GB in pictures and videos in total, but only 14GB of that is present on my laptop's disk at the moment.

Git annex add

To add files to a git annex repo, you have to use git annex add $filename on the command line. You have to be careful not to forget the annex part there. If you forget it (not unlikely if git add is baked in your muscle memory), you'll store the content in the git repo, instead of the git-annex extension of it. This means that this content will be recorded in the git history and will end up on all clones, even if you remove it in later commits. In contrast, git annex does its magic by storing the content in a different place than the normal git repo, and only storing symlinks in the git repo. Bottom line: if you accidentally use git add instead of git annex add you ruin the whole point of using git annex, and it is very hard to fix such a mistake if you discover it too late.

Foolproofing myself with a git hook

To prevent myself from making such mistakes, I set up a git pre-commit hook. In my case it's not too complex, because I only store "big files" to be git annex'ed in my repo. I just have check that I commit symlinks and no real files.

This is my .git/hooks/pre-commit:

#!/bin/sh
# automatically configured by git-annex
git annex pre-commit .


###############################################################
# Prevent that real files are committed, only accept symlinks.
###############################################################
# Standard git pre-commit stuff to find what to take a diff against.
if git rev-parse --verify HEAD >/dev/null 2>&1
then
    against=HEAD
else
    # Initial commit: diff against an empty tree object
    against=4b825dc642cb6eb9a060e54bf8d69288fbee4904
fi
# Go though added files (--diff-filter=A) and check whether they are symlinks (test -h).
# To handle file names with spaces and possibly other weird characters, we use
# this funky "-z while IFS read" construct.
git diff --cached --name-only --diff-filter=A -z $against | while IFS= read -r -d '' file; do
    if test ! -h "$file"; then
        echo "Aborting commit: for this git-annex repo we only want symlinks and this file is not: $file" >&2
        exit 1
    fi
done

Note: the git annex pre-commit . part was the original hook implementation, added by git annex init, which I kept of course.