[racket-dev] A story of a package split

From: Tony Garnock-Jones (tonyg at ccs.neu.edu)
Date: Tue Aug 13 16:22:19 EDT 2013

Hi all,

Matthias asked me to write a few words about an experience I had 
splitting a large repository of code up into smaller repositories and 
then building a mechanism to tie them together again.

== A short story ==

Once upon a time, RabbitMQ (www.rabbitmq.com) was held in a single, 
monolithic Mercurial repository, including the server, the Java client 
library, the .NET client library, the Erlang client library, the 
protocol codec compiler, the documentation, adapters for other related 
messaging protocols, and so on.

We decided for various reasons to split the monolithic repository into 
separate repositories. The approach we ended up taking was to have a 
single repository, the "umbrella", which included a Makefile and a 
handful of scripts which checked out, updated, compiled etc. a number of 
other repositories from various places. You can still see the umbrella 
today here: http://hg.rabbitmq.com/rabbitmq-public-umbrella/file/default

The workflow for someone working on RabbitMQ is now:

1. Check out the umbrella, and `cd` into it.
2. Run `make checkout`.
3. Run `make`.
4. Edit, compile, debug, commit and push in the subdirectories resulting
    from step 2.
5. Occasionally run `make update` in the umbrella.

(There's also some ugly makefile machinery to do cross-subrepository 
dependency tracking to let `make` in a subrepo recompile just the right 
things. Mostly.)

Personally, I frequently use a script, `foreachrepo` (git variant 
attached) that lets me operate on all repositories found under the 
umbrella at once. For example,

     $ foreachrepo pwd

would tell me where all the checkouts live, and

     $ foreachrepo git status

would show me their status.

When a configuration is found that works nicely and is to be released, a 
tag is made across all the currently-checked-out repositories:

     $ foreachrepo git tag my_release_2.3.4
     $ foreachrepo git push --tags

The split into completely separate repositories, linked informally by 
action of a script, worked out well for RabbitMQ, and the RabbitMQ 
project seems to be living happily ever after.

== Comment ==

The problem addressed here is *configuration management*. RabbitMQ takes 
a very loose approach to configuration management, where individual 
modules evolve independently and are only connected to each other by 
happening to be in sibling directories within the umbrella. Tags are 
used to take a snapshot of a group of repositories at the same time.

Another approach to configuration management uses an explicitly 
*versioned* manifest, where an umbrella repository names other 
repositories *and specific versions* of their contents to pull into 
scope. This is taken by systems like rebar, and is essentially how git 
submodules work.

You could frame the contrast between the two by saying that the RabbitMQ 
approach is essentially *optimistic*, freezing configurations only when 
needed, and with occasional frankenconfigurations (when you `git pull` 
one subrepo but not one of its siblings) a risk during development, 
whereas the `git submodule` approach is *pessimistic*, keeping 
configurations frozen until explicitly moved forward into the next 
frozen configuration.

An intermediate form could be imagined, where the Makefile checks out 
specific versions or branches but otherwise leaves them free to evolve 
in a way `git submodule` prohibits.

Vincent has recently run into issues of configuration management: he 
wishes to assemble a specific collection of packages at specific 
versions to run a particular application (namely, some benchmarks).

Others on this list do similar things, assembling specific versions of 
libraries into complete applications.

I think it's interesting that both releasing applications and releasing 
the Racket system itself have this problem of describing a collection of 
related packages.

Cheers,
   Tony
-------------- next part --------------
#!/bin/sh
for gitdir in $(find . -type d -name .git)
do
	repo=$(dirname $gitdir)
	echo "===== $repo"
	(cd $repo; "$@")
	echo
done

Posted on the dev mailing list.