[racket-dev] proposal for moving to packages: repository

From: Eli Barzilay (eli at barzilay.org)
Date: Fri May 24 12:44:35 EDT 2013

Four hours ago, Matthew Flatt wrote:
> At Fri, 24 May 2013 03:26:45 -0400, Eli Barzilay wrote:
> > If that can be done reliabely, then of course it makes it possible to
> > do the split reliabley after the first restructure.
> 
> Great! Let's do that, because I remain convinced that it's going to
> be a lot easier.

I'm really surprised.  Given that you consider this a *lot* easier,
and that I consider it (reorganization + split) a lot messier, I think
that I'm still not getting something.


> > * Also, I'd worry about file movements on top of paths that
> >   existed under a different final path at some point
> 
> I believe the file-lifetime computation in "slice.rkt" takes care of
> that.

That's what it looks like, but I'd double-check to make sure that it
happens.


> > * The script should also take care to deal with files that got
> >   removed in the past.
> 
> Ditto.

I don't believe that it's *not* doing this, so I did the double-check
in the form of a test.  When I run it, I see these bad things (which I
expected to happen, so wrote it as a test):

* The "c" file got completely lost (this is the pre-reorganization
  file deletion scenario)

* The "b" file got lost too (post-reorg deletion)

* The history of "e" during the "A" days got lost, since it was not
  recognized as a rename in the A->B move due to being edited too.

=> The first two are things that a script can deal with doing some
   kind of scan like I mentioned (go over the full history of the full
   tree).

=> The third one is something that requires human judgment *but* if
   the A/e historic file is considered as deleted, and if deleted
   files from the original directories are included with doing the
   above, then it should still be there in the rewritten repo.

Test file attached; probably need to do very little other than
adjusting the paths to the two racket scripts.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: b
Type: application/octet-stream
Size: 999 bytes
Desc: not available
URL: <http://lists.racket-lang.org/dev/archive/attachments/20130524/f4db0464/attachment.obj>
-------------- next part --------------


> > * Actually, given the huge amount of time it's running (see next
> >   bullet), it's probably best to make it do the movements from all
> >   paths at the same time.
> 
> There's no need to move anything while extracting a repository
> slice; the movements happen before.

What I'm saying is that if filter-branch using your script takes 20
hours, and you want to use it to split the repo to 5 packages, and if
a simple filter-branch with a subdirectory filter takes a few minutes,
then instead of:

  * filter-branch using your script 5 times to create each repository
  Total runtime: more than 4 days

you do this:

  * filter-branch one time using your script to reorganize the files
    according to packages
  * use filter-branch with a subdirectory filter 5 times to create
    each repository
  Total runtime: about 21 hours

This latter use would end up with the final tree being exactly the
same (since you're talking about doing the reorganization within git),
but the history would be different since it's as if the files were
there the whole time.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.