Storing Git repositories in Amazon S3 for high availability

At VolunteerMatch we’re experimenting with using Chef Solo to manage Amazon EC2 servers. The catch is that if a server is going to rely on Chef to boot up, then the Chef Recipes (which we’re storing in a Git Repository) need to be highly available.

Here’s how we went about using a private S3 bucket to store our Git repository of Chef Recipes. Thanks to this post on using JGit to publish to S3 which got us started, the key difference is we wanted to use a private S3 bucket and it took us some experimenting to figure out how to update an existing Git repo (via fetch and merge) from S3.

Download, rename it to jgit and put it in your path (for example $HOME/bin).

Setup the .jgit config file and add the following (substituting your AWS keys):

vim ~/.jgit

accesskey: aws access key
secretkey: aws secret access key

Note, by not specifying acl: public in the .jgit file, the git files on S3 will be private (which is what we wanted). Next create an S3 bucket to store your repository in, let’s call it git-repos, and then create a git repository to upload:

s3cmd mb s3://git-repos
mkdir chef-recipes
cd chef-recipes
git init
touch README
git add README
git commit README
git remote add origin amazon-s3://.jgit@git-repos/chef-recipes.git

In the above I’m using the s3cmd command line tool to create the bucket but you can do it via the Amazon web interface as well. Now let’s push it up to S3 (notice how we use jgit whenever we interact with S3, and standard git otherwise):

jgit push origin master

Now go somewhere else (e.g. cd /tmp) and try cloning it:

jgit clone amazon-s3://.jgit@git-repos/chef-recipes.git

When it comes time to update it (because jgit doesn’t support merge or pull) you do it in 2 steps:

cd chef-recipes
jgit fetch
git merge origin/master
This entry was posted in Cloud Computing, Java, Linux, Source Control, Systems Administration. Bookmark the permalink.

2 Responses to Storing Git repositories in Amazon S3 for high availability