[racket-dev] git rebase annoyances

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Jul 8 16:55:04 EDT 2010

So, rebasing is a common operation since we're working with a central
repository.  But it's very annoying that files are touched whenever it
happens.  For example, you modify "collects/foo/bar.rkt", you then run
a "git pull --rebase" to get updated with the current state, and since
I pushed some changes to "collects/meta/web" your files are all
touched.

This is due to the way that rebase works -- when the above happens,
git will: (a) undo your changes to get back to the `origin/master'
version, (b) update your `master' branch, (c) replay your changes on
top of it.  I suspect that it makes sense to do this in the
filesystem, since (c) can abort at any point due to conflicts.
(Perhaps they could optimize it so it's not done in the actual files,
but it would still be annoying if you need to resolve a conflict in
"collects/foo/blah.rkt" and as a result the "bar.rkt" file is touched
too.)

I believe that this problem bugged many people, and was the reason
Matthew added the sha1s in bytecode files.  Of course, drracket can be
"fixed" too, as well as emacs and any other tool -- but that's not
right.  (DrRacket is "worse" in this sense -- since it allows you to
actually work with a file that is not saved unlike other tools, so it
makes less sense to make it ignore a timestamp change when the
contents is the same.)

So I wrote a script that should help in avoiding this.  The basic idea
is to save the state of the repo -- all files with their hashes and
timestamps, and later "restore" the state by looking at the new hashes
and timestamps, and if there's any file with the same hash but a
different timestamp, then its time is modified to the old time.  It's
not the greatest way to solve this problem, but it works fine, and
it's even fast enough to be practical.

The script follows -- see the big comment at the top for instructions
on how to use it.  It's a little rough at this point, but I've tried
it a few times and it works as it should.  (It might make more sense
to improve it in the future by mapping each path to an alist of hashes
and timestamps, instead of keeping only the last pair.)  Please mail
me any feedback -- *including* positive feedback (if it works well
I'll suggest it on the git mailing list, and it'll help if I can say
that a number of people used this approach and it improved their
workflow).

-------------------------------------------------------------------------------
#!/bin/sh
#| -*- scheme -*-
exec mzscheme -um "$0" "$@"
|#

#|

  This is a helper to minimize changes to file timestamps by got commands (eg,
  rebasing).  This is done by saving a state table that maps each file in te
  repo to its hash (as given by git) and timestamp.  The table is later used to
  restore the state -- and for any file with an identical hash to the saved
  state but a different timestamp, we restore the saved timestamp.

  Executive summary: use "gitp" instead of "git" when rebasing is involved.
  Eg: "gitp pull --rebase".  If there are conflicts, use "gitp" when you're
  ready to continue after resolving the conflicts: "gitp rebase --continue".

  (This could have been more robust by computing a hash for each file, since
  git produces the hash of the file from its store, therefore ignoring any
  changes that are not comitted or staged.  But doing this would be extremely
  slow to do so, and in any case the problem is with files that git knows
  about.)

  There is a complication in making this work: the git commands in question are
  not running to completion.  For example, doing a rebase will stop when there
  are conflicts to let you resolve them, and continue with `--continue'.
  Therefore, there are two ways to use this script:

  1. Manually save/restore state

     Run "gitp 1" to save the current state, and later run "gitp 2" to restore
     it.  (The "1" "2" were the most convenient things for me to remember...)
     There is also a "gitp 3" which removes the saved state, see below for a
     use case.

  2. Run it as a git replacement: "gitp <git-verb-and-args>".  In this mode,
     the script will save the state before running the git command, and restore
     the state when it's done.  The state is still left in the .git directory,
     so it can be reused later using "gitp 2".

     In some cases like conflicts during rebasing, the git command will exit
     with an error code -- if this happens, the script marks the dumped state
     in a way that the next time it is called as a git replacement (ie, not
     with "1" or "2"), it will restore the originally saved state (before the
     first git command was called) rather than the state the new command
     started in.  (If you want to avoid this, you can run "gitp 3" to remove
     the saved state.)  Note that you should use this script only in a step
     that resolves the issue -- for example, "gitp rebase --continue"; do not
     use it for other git commands (eg, "gitp add conflicted/file") since that
     will reuse the previous state and lose it for the following "--continue"
     step.

  Note that the script does nothing more than change file timestamps, therefore
  any possible damage it can cause is limited to that.  In other words, the
  contents itself should be safe.

|#

#lang racket/base

(require racket/port)

(define this
  (let-values ([(dir file dir?) (split-path (find-system-path 'run-file))])
    (string->symbol (path->string file))))
(define (note fmt . args)
  (printf "~a: ~a\n" this (apply format fmt args)))

(define git
  (let ([exe (or (find-executable-path "git")
                 (error this "could not find the git executable"))]
        [stderr    (current-error-port)]
        [from-null (open-input-file "/dev/null")])
    (lambda (reader/outp . args)
      ;; reader/outp can be a function that consumes the subprocess's stdout,
      ;; or a port to dump it onto; in the latter case, it returns the exit
      ;; status rather than exit if it was not 0.
      (define-values [pid pout pin perr]
        (apply subprocess (if (output-port? reader/outp) reader/outp #f)
               from-null stderr exe args))
      (let ([r (if (output-port? reader/outp)
                 (void) (reader/outp pout))])
        (subprocess-wait pid)
        (let ([s (subprocess-status pid)])
          (cond [(output-port? reader/outp) s]
                [(zero? s) r]
                [else (exit s)]))))))

(define (->line in) (regexp-replace #rx"\n$" (port->string in) ""))
(define (->lines in) (port->lines in))

(current-directory (git ->line "rev-parse" "--show-toplevel"))

(define metadata
  (format "~a/.~a-state" (git ->line "rev-parse" "--git-dir") this))
(define metadata-use-prev (string-append metadata "-use-prev"))

(define rx:ls-files #rx#"^[0-9]+ ([0-9a-f]+) [0-9]+\t([^\0]+)\0")
(define (get-current-state)
  (let ([t (make-hash)])
    (git (lambda (in)
           (let loop ()
             (let ([m (regexp-match rx:ls-files in)])
               (when m
                 (let ([hash (cadr m)] [file (caddr m)])
                   (hash-set! t file
                              (cons hash (file-or-directory-modify-seconds
                                          (bytes->path file)))))
                 (loop)))))
         "ls-files" "-s" "-z")
    t))

(define (write-state state)
  (call-with-output-file metadata #:exists 'truncate
    (lambda (o) (write state o))))

(define (save-state [state (get-current-state)])
  (write-state state)
  (note "saved current state"))

(define (delete-state)
  (if (file-exists? metadata)
    (begin (note "removing saved state") (delete-file metadata))
    (note "no state to remove"))
  (when (file-exists? metadata-use-prev) (delete-file metadata-use-prev)))

(define (restore-state [existing-state #f])
  (define old-state
    (cond [existing-state]
          [(file-exists? metadata) (call-with-input-file metadata read)]
          [else (error this "missing metadata file to restore state from")]))
  (define new-state (get-current-state))
  (define changed? #f)
  (note "restoring state")
  (for ([(file old) (in-hash old-state)])
    (let ([new (hash-ref new-state file #f)])
      (when new
        (let ([old-hash (car old)]
              [old-time (cdr old)]
              [new-hash (car new)]
              [new-time (cdr new)])
          (when (and (equal? old-hash new-hash)
                     (not (equal? old-time new-time)))
            (set! changed? #t)
            (note "  ~a" file)
            (file-or-directory-modify-seconds (bytes->path file) old-time)
            (hash-set! new-state file old))))))
  (unless changed? (note "  (nothing to restore)") (flush-output))
  (write-state new-state))

(define (run-protected-command args)
  (define state
    (if (file-exists? metadata-use-prev)
      (begin (note "** previous call indicated an error code")
             (note "   reusing previous state")
             (delete-file metadata-use-prev)
             (call-with-input-file metadata read))
      (begin (note "getting current state")
             (get-current-state))))
  (let ([s (apply git (current-output-port) args)])
    (write-state state)
    (restore-state state)
    (unless (zero? s)
      (note "** git returned with an error code (see above for details),")
      (note "   the initially saved state will be used next time")
      (note "   this script is invoked with a git command to execute.")
      (call-with-output-file metadata-use-prev #:exists 'truncate newline))))

(provide main)
(define (main . args)
  (cond [(equal? args '("1")) (save-state)]
        [(equal? args '("2")) (restore-state)]
        [(equal? args '("3")) (delete-state)]
        [else (run-protected-command args)]))
-------------------------------------------------------------------------------

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.