[racket-dev] A story of a package split
Hi all,
Matthias asked me to write a few words about an experience I had
splitting a large repository of code up into smaller repositories and
then building a mechanism to tie them together again.
== A short story ==
Once upon a time, RabbitMQ (www.rabbitmq.com) was held in a single,
monolithic Mercurial repository, including the server, the Java client
library, the .NET client library, the Erlang client library, the
protocol codec compiler, the documentation, adapters for other related
messaging protocols, and so on.
We decided for various reasons to split the monolithic repository into
separate repositories. The approach we ended up taking was to have a
single repository, the "umbrella", which included a Makefile and a
handful of scripts which checked out, updated, compiled etc. a number of
other repositories from various places. You can still see the umbrella
today here: http://hg.rabbitmq.com/rabbitmq-public-umbrella/file/default
The workflow for someone working on RabbitMQ is now:
1. Check out the umbrella, and `cd` into it.
2. Run `make checkout`.
3. Run `make`.
4. Edit, compile, debug, commit and push in the subdirectories resulting
from step 2.
5. Occasionally run `make update` in the umbrella.
(There's also some ugly makefile machinery to do cross-subrepository
dependency tracking to let `make` in a subrepo recompile just the right
things. Mostly.)
Personally, I frequently use a script, `foreachrepo` (git variant
attached) that lets me operate on all repositories found under the
umbrella at once. For example,
$ foreachrepo pwd
would tell me where all the checkouts live, and
$ foreachrepo git status
would show me their status.
When a configuration is found that works nicely and is to be released, a
tag is made across all the currently-checked-out repositories:
$ foreachrepo git tag my_release_2.3.4
$ foreachrepo git push --tags
The split into completely separate repositories, linked informally by
action of a script, worked out well for RabbitMQ, and the RabbitMQ
project seems to be living happily ever after.
== Comment ==
The problem addressed here is *configuration management*. RabbitMQ takes
a very loose approach to configuration management, where individual
modules evolve independently and are only connected to each other by
happening to be in sibling directories within the umbrella. Tags are
used to take a snapshot of a group of repositories at the same time.
Another approach to configuration management uses an explicitly
*versioned* manifest, where an umbrella repository names other
repositories *and specific versions* of their contents to pull into
scope. This is taken by systems like rebar, and is essentially how git
submodules work.
You could frame the contrast between the two by saying that the RabbitMQ
approach is essentially *optimistic*, freezing configurations only when
needed, and with occasional frankenconfigurations (when you `git pull`
one subrepo but not one of its siblings) a risk during development,
whereas the `git submodule` approach is *pessimistic*, keeping
configurations frozen until explicitly moved forward into the next
frozen configuration.
An intermediate form could be imagined, where the Makefile checks out
specific versions or branches but otherwise leaves them free to evolve
in a way `git submodule` prohibits.
Vincent has recently run into issues of configuration management: he
wishes to assemble a specific collection of packages at specific
versions to run a particular application (namely, some benchmarks).
Others on this list do similar things, assembling specific versions of
libraries into complete applications.
I think it's interesting that both releasing applications and releasing
the Racket system itself have this problem of describing a collection of
related packages.
Cheers,
Tony
-------------- next part --------------
#!/bin/sh
for gitdir in $(find . -type d -name .git)
do
repo=$(dirname $gitdir)
echo "===== $repo"
(cd $repo; "$@")
echo
done