[racket-dev] proposal for moving to packages: repository

From: Eli Barzilay (eli at barzilay.org)
Date: Mon May 20 18:27:34 EDT 2013

An hour and a half ago, Matthew Flatt wrote:
> I used to think that we'd take advantage of the package manager by
> gradually pulling parts out of the Racket git repo and making them
> packages.

(Generally, +1.  I'll reply just on the repository point here.)


> This plan has two prominent implications:
> 
>  * The current git repo's directory structure will change. [...]

I very strongly object to this.  While in theory git will follow
everything, this requires doing some more work which most people won't
know about, so a result of all of this is going to be loss of
historical information.  So I think that it's much better to move
directly to several repositories (IIUC, one repository for each
suggested toplevbel directory).

The only goal of the intermediate state seems to be providing some
gradual change before switching to submodules -- and on one hand, I
think that the new layout will force people to learn how to deal with
it, and on the other, it'll make people spend work twice, once on the
layout change and again on the switch to modules.

So assuming that a gradual change is the goal, I think that there are
better ways to do that.  Here's a suggestion:

  * The main repository is split into the different repositories.
    Initially, this is done without any consideration for submodules,
    with the idea of having "advanced gitters" come up with their own
    solutions.

  * However, don't remove the main repository, just keep it as an
    aggregate of the content that is found in the split repositories.
    If the structure is going to be the same in all of them (ie, the
    same directories and files are in all as they are now in the
    single repository), then pulling changes from the new repos to the
    main one is going to be trivial to the point of being automated.

  * The new repos will not get mirrored on github.  This is because
    github repos come with a bunch of functionality that is best kept
    in a single place -- like wiki pages and issues.  (But see below.)

  * So the only difference would be for people who commit work to the
    main repo.  This can be done in various ways, depending on the
    developers who do these commits:

    - Advanced developers would have all of the repos and will push
      directly to them.  This group of people is likely to start
      small, and evenetually have all of the core committers in it.
      ("Core" as in the people who push to the plt repo now.)  As I
      said above, this will likely involve some experimentation for
      these people, which will later get translated into easy setups
      that will allow more people to switch to it.

    - "Outsiders" can continue to work as usual: fork the main plt
      repo (mostly on github) and send pull requests.  The pull
      request will then be pushed by a core committer as it is done
      now, where the core committer pushes to the actual relevant
      repo, and that eventually propagates back to the main repo so
      that the contributor sees that the work was merged.  The merging
      should usually be trivial, except in extremely rare cases where
      the push touches on files from different new repos.  In these
      cases it should be possible to either split the commit into
      different ones for the different repos, or ask the contributor
      to split the commit to different ones for the different files.

    - The only people left are core committers who will work with the
      main repository.  I can see a bunch of ways to deal with this.
      First, the commit can be sent as a pull request to one of the
      advanced gitters who will then do it for the actual repository.
      This is easier than it sounds: git has a bunch of commands to do
      this, and for all practical purposes, you'd just replace the
      "git push" part of your workflow with "git send-email".  I
      *think* (but I'm not 100% sure) that this work can be automated
      too, so it's fine if I (or some other excited soul) gets these
      emails and merges them.

      There is an inconvenience point here: once you send a pull
      request and its merged, the actual commits that are merged (to
      the main repo, which you're using if you're in this group) are
      different objects.  This is nothing new -- it's something that
      people who do all contibutions via pull requests deal with,
      since we have a policy of rebasing rather than merging.
      Usually, when you pull from the update repo, git should notice
      that your changes are already there.  (At least I hope it does.)

      Things will be less convenient for people who use git more
      intensly: if you have lots of branches etc.  But I think that
      such people really should just move to the first group sooner...

  * This stage can go on for a while, as the code & machinery involved
    evolves to a point of being smooth enough.  By smooth, I mean that
    - it be easy enough to build the whole thing as you do now,
    - nighly builds, drdr, etc, are all adapted to the multiple repos,
    - most people feel comfortable with multiple repos, specifically,
      people who will need to switch their work from the big repo with
      only a small part that they're actually interested in, to having
      a plain build of the whole thing with only the interesting part
      coming from a checked-out repository.  (These are currently core
      committers, and they are likely the last to switch to multiple
      repos.)

  * Once the new repos work fine for most people, switch to having
    them as the main place: start mirroring the repos on github (and
    elsewhere), and remove the monolithic one.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.