git-annex is a recent tool allowing to manage files with git without having theirs contents checked into git.
I've been looking at it with some interest in the past few weeks in order to use it to handle my music collection on the various computers I use.
My laptop just got a new SSD drive which is quite small and cannot handle the whole collection. By using git-annex, I'm now able to only check out part of my music collection and keeping it synchronized with all my others computers.
Using git-annex is pretty trivial once you understand the concepts.
I've set up a a git repository in my Music directory, like that:
% git init Initialized empty Git repository in /home/jd/Music
Then I initialized git annex:
% git annex init "Music on keller"
So now I'm able to import all my data into git annex. git-annex offers several backends to store data. WORM is the default, but is not a good choice for file that could be modified, and music files can be, at least when editing the metadata. Therefore, I chose the slow but more reliable SHA1 backend.
% git annex add --backend=SHA1 .
That can be long, so you should be patient. Then you should commit the fact you annexed files:
% git commit -a -m "Imported my music collection" [master 82d060f] Imported my music collection
Now, it's done. You can go on another computer and clone the repo:
% git clone ssh://keller/~/Music % cd Music % git annex get "Incubus - Light Grenades"
And then let the magic happens. It will retrieve the whole directory for you. Do not forgot to commit the fact that you got those album on that machine!
The concept used by git-annex is quite simple. All files are replaced by symlinks pointing to .git-annex/SHA1:filechecksum.log. This file contains lines, whose format is:
retrieval-timestamp is-file-present uuid-of-the-repository
Indicating if the file is present in a repository. This logs file are commited into the repository, whereas the real content is stored in the .git/annex directory and therefore not commited.