Sunday, March 3, 2013

git-annex: encrypted remotes

Due to the data loss I blogged about, I had to reverse engineer the encryption used by git-annex for its encrypted special remotes. The file system on which the content lived has a bullet hole of 8 GB in it, which was helpfully discarded by pvmove. It's pretty unhappy about that fact, parts of the git repository are unusable and directories cannot be accessed anymore. git-annex cannot possibly run anymore.

However, I was still able to access the git-annex branch within said git repository (using porcelain). This branch contains a file called remote.log which contains the keys of the special remotes. There's one per remote, encrypted to a GPG key of your choice and all files within that remote are encrypted with the same symmetric key.

One small detail stopped me from getting the decryption right the first time, though. It seems that git-annex uses randomness generated by GPG and armored into base64. In my naïveté I spotted the base64 and decoded it. Instead it's used verbatim: the first 256 bytes as HMAC key (which reduces randomness to 192 bytes) and the remaining bytes for the symmetric key used by GPG (which will do another key derivation for CAST5 with it). A bug about that just hit the git-annex wiki.

With that knowledge in mind I wrote a little tool that's able to generate encrypted content keys from the plain ones used in the symlinks. That helps you to locate the file in the encrypted remote. Fetch it and then use the tool to decrypt the file in question with the right key.

The lesson: Really backup the git repository used with git-annex and especially remote.log. I'm now missing most of the metadata but for some more important files it's luckily still present. Recovery of the file content does not depend on it if you can deduce the filename from the content. If you have many little files it might be a bit futile without it, though.

No comments:

Post a Comment