[racket-dev] Line editing in the default REPL

From: Eli Barzilay (eli at barzilay.org)
Date: Wed Dec 3 18:10:03 EST 2014

On Wed, Dec 3, 2014 at 2:45 PM, Leif Andersen <leif at leifandersen.net> wrote:
> My goal was not to replace xrepl, but to provide basic line editing
> support to the default repl without licensing violations or massively
> increasing the distribution size.

Yes, that's exactly what I was talking about.  I hope that you know what
you're doing, though so far the "easy to underestimate" seem to be the
case:

>> If you're talking about implementing line editing yourself, then my
>> personal reaction to that would be "wonderful", but doing it properly
>> is something that is difficult and easy to underestimate....
>
> I've already done this (admittedly it only works on OS X, but most
> Linux terminals work exactly the same with a few different
> constants). You can see what I have so far here:
> https://github.com/LeifAndersen/racket-line-editor/blob/master/main.rkt

If this works, I'd be thoroughly impressed -- so I tried it.  I ran
through a bunch of configurations that I use:

- plain xterm,
- ssh in xterm,
- linux text console,
- ssh in a linux console,
- `term' in Emacs (a terminal emulator, not the more popular inferior
  shell),
- ssh from windows on a mintty,
- ssh from windows in a cmd box (Win7, much better than past versions)
  and in a powershell window,
- putty from windows.

(The racket process is always running in a linux machine, there's no
need for it on Windows.)  In each of these I first tried readline (a
basic test: start racket, move around, go back in history to a multiline
entry) -- it worked fine in all cases.

I then tried your code -- took your file as-is, and added

    (module+ main (line-editor "rkt> "))

in the end.  It failed in all cases.  The failures varied, but in pretty
much all cases it just spits out an escape sequence (which is not the
right one for the terminal, otherwise it wouldn't be shown), a different
one in each case.  I then tried some navigation keys, and none of them
worked either.  In some cases they'd move the cursor where it shouldn't
go (the cmd-based cases, as expected), in other cases keys behaved in
various broken ways: spitting uncaught escapes, moveing two lines up for
each keypress, the <end> key moved down a line in one case, the <return>
key spits out a "^M".  Using C-j twice (in some cases; I'm too tired to
try them all) seem to give your code what it wants to start reading so
it displays the prompt when I do that -- then, the navigation keys are
still broken, unsurprisingly (and random escape sequences still garble
what I see).  Another weirdness is that it looks like your code waits
for a ^M, sometime later followed by a ^J, and then it returns a string:
this is very broken since a ^J is ignored before the ^M, and there can
be any number of characters between the ^M and the ^J.  (But that might
be bogus, since things are very garbled anyway, so it's hard to guess
what's going on.)

But it's actually looking at the code that makes me conclude that you're
underestimating how complicated getting this right can be.  Some various
comments in no particular order:

* Looks like there is no querying of the terminal for capabilities, and
  there's no form of database of terminals.  See the man pages for
  *termcap and *terminfo for libraries that implement these things, and
  you can also check the Emacs source code which still has a lot of code
  that deals with that in the "term" directory.

* These are needed to know what you can spit out, and what you can
  expect to read in.  Assuming that the rest of the terminal world
  behaves like the random one you're using is a good recipe for getting
  something completely broken.

* Looking at the code in `edit', the little that it does have is very
  very broken.  (If this is a translation of linenoise, then feel free
  to forward it to whoever does that...)
  - You should generally prefer `write-bytes' and `read-byte' to avoid
    getting bogus character encoding in the way.  (But see above: you
    need to consult the terminal to know if it can give you 8-bit or
    even wide characters.)
  - There is no form of abstraction here, resulting in a monolithic
    piece of code that is going to be a maintenance disaster.  You
    should separate out the code that reads a key to a different
    function, and make it return some proper symbolic name (combined
    with characters for simple keys with an ascii equivalent).  This way
    you can also hook more functionality on more keys.
  - Some of these escape sequences look like they could work, but there
    are a ton of variations.  For example: I have seen the <up> key
    generate at least "\e[A", "\eOA", and "\e1;1A".
  - The last form is interesting, since it's a new-ish way to represent
    keys with modifiers, so you get "\e1;<N>A" with <N> being "1" to "8"
    indicating no-modifiers, shift, alt/meta, shift-meta, control, etc.
    The are similar generic escapes for most keys, and that's one case
    that is easy to parse.  As a different example, my xterm generates
    "\e[27;6;85~" for shift-control-U: the "27" is the generic prefix
    for these keys, "6" stands for the shift+control, "85" is the ascii
    of U, and "~" terminates the escape sequence.
  - Note BTW that the escape sequence "parser" is completely broken,
    since it assumes some given lengths in a very primitive way that
    would break in the presence of these generic escapes I mentioned
    above.  It is important to know about the escape sequences so that
    even if you don't know how to translate some to keys you still know
    when the escape sequence ends so you can ignore the whole key and
    not leave leftovers behind.

* Your tests are not great too -- they're similar to copy+paste tests
  which encapsulate a specific behavior, and the thing that really
  matters (that it actually works) is left untested.

Again, I wrote all of this in a kind of hope that you'll do this, but
practically speaking, this code is so far from working that if you care
for your time it would be best to avoid it.  In other words, you have
almost nothing done there, compared to the amount of work that should be
added.  (But be careful of my cheap reverse psychology...)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.