[racket] structural program comparison tool in Racket
5 hours ago, Asumu Takikawa wrote:
> On 2012-09-14 04:34:21 +0800, Yin Wang wrote:
> > I'm developing a tool for "diffing" program by parse trees and not
> > text. It is written in Racket and can process Lisp family
> > languages, C++, JavaScript and Python. It has a JavaScript based
> > interactive UI for browsing the diff results.
> >
> > You can find a demo of it (diffing two Emacs Lisp programs) here:
> >
> > http://www.cs.indiana.edu/~yw21/demos/paredit20-paredit22.html
>
> This is really neat! Any plans for a DrRacket plugin? :)
IMO, this would be cute, but I think that a tool like this has *way*
more potential...
IOW, I think that it would be nicer to have a good api for the tool,
something that takes two files (or two pieces of text) and returns the
representation of the additions/deletions and the mapping between
chunks of text that is the same. This way instead of one tool that
you (Yin) write, many people can write many tools.
Some examples for things that could be written with such an API:
* IIRC, we have two sexpr-diffing things, one in the tree and one on
planet, and at least one of them is used in testing.
* It could obviously be used for a better homework-copying detection
tool.
* It could be used in tools that show two pieces of similar text. Two
cases of that are the usual stepper and the macro one. They could
use it to render just the differences between two adjacent texts.
Although in both of these cases the tools themselves have their own
semantic information that is more precise, my guess is that your
tool will work fine in most practical cases, and could be used for
similar tools in the future.
* A *very* useful thing to do would be to write a wrapper that uses
your API and spits out text in the word-diff format that is used in
git. (Try to run git with "--word-diff=porcelain" to see how that
looks like.) The reason that this is useful is that now it becomes
possible to take tools that work with the git format, and reuse them
as is with this wrapper. Just think about things like:
- Plugging it into things like gitweb (or aim high and go straight
to github etc) to get better diff output that is sexpr-aware.
- Plugging it into git itself. Git has a hook to run your own diff
commands (eg, on all *.rkt* files), and it sounds like your tool
will make lots of people in the various Lisp tribes be *very*
happy.
There is, BTW, a discrepancy there -- IIRC, the git word-diff format
is just that, so there is no place to add a mapping between the
matching pieces of text.
* Also, I imagine that git with this tool could be used in more
interesting ways. Think about a "git blame" that shows the history
of each expression in a way that is more robust than what it can do
now. Now *that* could be an extremely useful drracket plugin.
(Imagine a button like syntax-check, that after you click it
mouse-hovering shows the commit(s) in which the expression was
constructed. You could ask the person who wrote it to clarify
things and not worry about an answer like "I just refactored code in
that file, I don't know who wrote it".)
But that's assuming that the tool finds things even if they're not
in the same order. (Which I imagine makes the problem hard enough
to require some heuristic searching.)
* Finally, it would be trivial to adapt it to XML, JSON, etc. Just
use Racket parsers, print the sexprs, and run the diff.
* And one last side-comment: if the API works at the syntax object
level, so you get the output as things that have source information,
then it's makes it trivial to adapt to any language that is
implemented in Racket. Just parse the code into a syntax object,
run the tool, then render the results using the source information
so you see it over the original syntax.
* Profit.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!