[racket-dev] proposal for moving to packages: repository

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Thu May 23 11:44:08 EDT 2013

Hi Eli: I'm trying to understand your point. Do I have this right?

Background: The git history consists of a series checkpoints in time of the
entire repository, not a collection of individual files. So, when I do "git
log x.rkt" then what I get is essentially a filtered list (except where
people didn't properly rebase, but lets ignore that) of those checkpoints:
all the ones where "x.rkt" changed.

Big Question: The issue is, then, when we split up the current repo into
smaller repos, what are the series of checkpoints that we're going to "make
up" for the individual repos? Right?

Your Advice: And, IIUC, you're suggesting that the best way to deal with
this question is to defer it until we are more sure of the actual split we
want to make. So we don't mess with the history at all and instead just
work at the level of some script that we can run to just use "mv" and
company to move things around. When we know exactly what ends up going
where, then we can figure out how to make up a new, useful history for the
separate repositories.

Is that the point?

Robby



On Thu, May 23, 2013 at 4:41 AM, Eli Barzilay <eli at barzilay.org> wrote:

> 9 hours ago, Matthew Flatt wrote:
> > At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote:
> > > That's true, but the downside of changing the structure and having
> > > files and directories move post structure change will completely
> > > destroy the relevant edit history of the files, since it will not
> > > be carried over to the repos once it's split.
> >
> > It's possible that we're talking past each other due to me not getting
> > this point.
>
> (Obligatory re-disclaimer: I consider the problem with forcing people
> to change their working environment much more severe.)
>
>
> > Why is it not possible to carry over history?
> >
> > The history I want corresponds to `git log --follow' on each of the
> > files that end up in a repository. I'm pretty sure that such a
> > history of commits can be generated for any given set of files, even
> > if no ready-made tool exists already (i.e., 'git' is plenty flexible
> > that I can script it myself).
> >
> > Or maybe I'm missing some larger reason?
>
> The thing to remember is just how simple git is...  There's no magical
> way to carry over a history artificially -- it's whatever is in the
> commits.
>
> To make this more concrete (and more verbose), in this context the
> point is that git filter-branch is a simple tool that basically
> replays the complete history, allowing you to plant various hooks to
> change the directory structure, commit messages or whatever.  The new
> history is whatever new commits are in the revised repository, with no
> way to make up a history with anything else.
>
> Now, to make my first point about the potential loss of history that
> is inherent in the process -- say that you want to split out a
> "drracket" repo in a naive way: taking just that one directory.  Since
> it's done naively, the resulting repository will not have the
> "drscheme" directory and its contents, which means that you lose all
> history of files that happened there.  To try that (in a fresh clone,
> of course) -- first, look at the history of a random file in it:
>
>   F=collects/drracket/private/app.rkt
>   git log --format='----%n%h %s' --name-only --follow -- "$F"
>
> Now do the revision:
>
>   S=collects/drracket
>   git filter-branch --prune-empty --subdirectory-filter "$S" -- --all
>
> And look at the same log line again, the history is gone:
>
>   git log --format='----%n%h %s' --name-only --follow -- "$F"
>
> If you look at the *new* file, you do see the history, but the
> revisions made in "drscheme" are gone:
>
>   git log --format='----%n%h %s' --name-only --follow -- private/app.rkt
>
> In any case, this danger is there no matter what, especially in our
> case since code has been moving around in the "racket" switch.  I
> *hope* that most of it will be simple: like carrying along the
> "drscheme" directory with "drracket", the "scheme" and "mzlib" with
> "racket", etc.  Later on, if these things move to "compat" packages,
> the irrelevant directories get removed from the repo without
> surgeries, so the history will still be there.  This shows some of the
> tricks that might be involved in the current switch: if you'd want to
> have some "compat" package *now*, the right thing to do would be:
>
>   * do a simple filter-branch to extract "drscheme" (and other such
>     collections) in a new repository for "compat"
>
>   * for "drracket": do a filter-branch that keeps *both* directories
>     in, then commit a removal of "drscheme".  (Optionally, use rebase
>     to move the deletion backward...)
>
> Going back to the repo structure change that you want and the reason
> that I said that doing moves between the package directories
> post-restructure is destructive should be clear now: say that you move
> collects/A/x into foo/A/x as part of the restructure.  Later you
> realize that A/x should go into the bar package instead so you just
> move it to bar/A/x.  The history is now in, including the rename, but
> later on when bar is split into a separate repo, the history of the
> file is gone.  Instead, it appears in the foo repository, ending up
> being deleted.
>
> One way to get around this is to avoid moving the file -- instead, do
> another filter-branch surgery.  This will be a mess since each such
> change will mean rebuilding the repository with all the pain that this
> implies.  Another way to get around it is to keep track of these
> moving commits, and when the time comes to split into package repos,
> you first do another surgery on the whole repo which moves foo/A/x to
> bar/A/x for all of the commits before the move (not after, since that
> could lead to other problems), and then do the split.
>
> This might work, but besides being very error-prone, it means doing
> the same kind of file-movement tracking that I'm talking about anyway.
> So take this all as saying that the movement of files between packages
> needs to be tracked anyway -- but with my suggestion the movement is
> delayed until it's known to be final before the repo split, which
> makes it more robust overall.
>
> ----
>
> But really, the much more tempting aspect for me is that this can be
> done now -- if you give me a list of packages and files, I can already
> do the movement script.
>
> Actually, in an attempt to tempt you more, here's what I can do now
> (as in the very near future):
>
> Start from the list of directories/files in your min repo as a
> specification of the contents of the core package, and decide that
> everything else is in another "everything-else" package.  (Since
> there's no actual file movements, it is cheap to use temporary names
> and partial specifications.)
>
> Then, change how the build works on the main machine (leave the other
> machines as is for now): after the initial few steps of updating
> version files etc the script doesn't use a repo -- it uses just the
> exported directory.  So after it exports the directory for building,
> the main machine will:
>
>   - run the script to get the package directories, so you get
>     something like (in $PLTHOME, whereever the build works):
>
>       collects \
>       doc       \  all of these
>       man       /  are empty
>       src      /
>       core/collects
>       core/man
>       core/src
>       everything-else/collects
>
>   - it now moves core/* up a level (and removes the empty "core"
>     directory)
>
>   - do the regular build: executables + raco setup
>
>   - next, move everything-else/* up a level too
>
>   - run another setup
>
> This means that now the build makes sure that the dependencies are
> fine: that the core doesn't depend on everything-else.  Later on, we
> can split another package out from everything-else, and insert it into
> the above sequence: build the core, add P, run setup, add everything
> else, run a final setup.  It can even get more sophisticated:
>
>   - build core,
>   - add P1, setup, move the built P1 out,
>   - add P2, setup, move the built P2 out,
>   - add everything-else and the built P1 & P2, run a final setup
>
> Yes, this is duplicating the dependency info between the packages, but
> this is all done temporarily (and for a small number of packages)
> until the proper package-based build is working and replaces it.
>
> In other words -- not only is my suggestion implementable now, it
> allows the project to proceed faster: you can go on with doing the
> package build, while everyone need to deal with respecting
> dependencies (deciding on which package a file goes with, avoiding
> breaking these dependencies).
>
> --
>           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                     http://barzilay.org/                   Maze is Life!
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20130523/0e2f4f14/attachment-0001.html>

Posted on the dev mailing list.