[racket] structural program comparison tool in Racket

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Sep 13 22:04:06 EDT 2012

5 hours ago, Asumu Takikawa wrote:
> On 2012-09-14 04:34:21 +0800, Yin Wang wrote:
> > I'm developing a tool for "diffing" program by parse trees and not
> > text. It is written in Racket and can process Lisp family
> > languages, C++, JavaScript and Python. It has a JavaScript based
> > interactive UI for browsing the diff results.
> >
> > You can find a demo of it (diffing two Emacs Lisp programs) here:
> >
> > http://www.cs.indiana.edu/~yw21/demos/paredit20-paredit22.html
> This is really neat! Any plans for a DrRacket plugin? :)

IMO, this would be cute, but I think that a tool like this has *way*
more potential...

IOW, I think that it would be nicer to have a good api for the tool,
something that takes two files (or two pieces of text) and returns the
representation of the additions/deletions and the mapping between
chunks of text that is the same.  This way instead of one tool that
you (Yin) write, many people can write many tools.

Some examples for things that could be written with such an API:

* IIRC, we have two sexpr-diffing things, one in the tree and one on
  planet, and at least one of them is used in testing.

* It could obviously be used for a better homework-copying detection

* It could be used in tools that show two pieces of similar text.  Two
  cases of that are the usual stepper and the macro one.  They could
  use it to render just the differences between two adjacent texts.
  Although in both of these cases the tools themselves have their own
  semantic information that is more precise, my guess is that your
  tool will work fine in most practical cases, and could be used for
  similar tools in the future.

* A *very* useful thing to do would be to write a wrapper that uses
  your API and spits out text in the word-diff format that is used in
  git.  (Try to run git with "--word-diff=porcelain" to see how that
  looks like.)  The reason that this is useful is that now it becomes
  possible to take tools that work with the git format, and reuse them
  as is with this wrapper.  Just think about things like:
  - Plugging it into things like gitweb (or aim high and go straight
    to github etc) to get better diff output that is sexpr-aware.
  - Plugging it into git itself.  Git has a hook to run your own diff
    commands (eg, on all *.rkt* files), and it sounds like your tool
    will make lots of people in the various Lisp tribes be *very*
  There is, BTW, a discrepancy there -- IIRC, the git word-diff format
  is just that, so there is no place to add a mapping between the
  matching pieces of text.

* Also, I imagine that git with this tool could be used in more
  interesting ways.  Think about a "git blame" that shows the history
  of each expression in a way that is more robust than what it can do
  now.  Now *that* could be an extremely useful drracket plugin.
  (Imagine a button like syntax-check, that after you click it
  mouse-hovering shows the commit(s) in which the expression was
  constructed.  You could ask the person who wrote it to clarify
  things and not worry about an answer like "I just refactored code in
  that file, I don't know who wrote it".)

  But that's assuming that the tool finds things even if they're not
  in the same order.  (Which I imagine makes the problem hard enough
  to require some heuristic searching.)

* Finally, it would be trivial to adapt it to XML, JSON, etc.  Just
  use Racket parsers, print the sexprs, and run the diff.

* And one last side-comment: if the API works at the syntax object
  level, so you get the output as things that have source information,
  then it's makes it trivial to adapt to any language that is
  implemented in Racket.  Just parse the code into a syntax object,
  run the tool, then render the results using the source information
  so you see it over the original syntax.

* Profit.

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.