[racket-dev] Code micro-level organization

From: Eli Barzilay (eli at barzilay.org)
Date: Wed May 30 17:40:20 EDT 2012

I'm going to ramble a bit about organizing code, trying to look for an
idea for a good solution -- so spread a few kgs of salt over the
following (if you care to read it).

The problem that I'm talking about has several manifestations.  The
most obvious one is code-drift towards the RHS.  A less obvious
problem is how it's sometimes hard to read code.  To use a cooked up
example:

  (let ([str (string-trim (substring "foo bar baz" 3 8))])
    (and (regexp-match? #rx"^[a-z].*[a-z]$" str)
         (string-append "*" str "*")))

to read this, you start from the string literal, then read the
`substring' expression, then `string-trim', then the `let' binding,
then the `and' and finally the `string-append'[*].  To relate this to the
above: besides the right-drift (which is of course very minor here),
it takes time to "internalize" the rules of the language that leads to
this, which is a problem for people new to functional programming with
it's heavy use of nested function calls.  More than that, I think that
it's also a problem for *experienced* hackers too -- to see what I
mean, open up any random piece of code that deals with an area you're
not familiar with, and try to read through it.  Personally, I often
find myself in such situations "reading" the actual ordering as I go
through the code, and that's fragile since I need to keep mental
fingers at various locations in the code in question, sometimes even
using my real fingers...

You'd probably recognize that there's a whole bunch of tools that are
trying to make things better.  A few random ones that I can think of
are:

  * The new semantics & blessing for using `define' forms instead of
    `let' etc makes code easier to read and avoids some right-drift.

  * There's the need (which I recently talked to at NEU) for some kind
    of a `define*' form that can be used as a definition with a `let*'
    scope.  For those who weren't there, the summary of the issue is
    something that Jay once said -- that he sometimes uses
      (define x0 ...)
      (define x1 (... x0 ...))
      (define x2 (... x1 ...))
    because he wants to avoid a `let*'.

  * The old `scheme/nest' is a direct attempt to prevent drift for
    some kinds of nestings.

  * There's the related suggestion for extending the reader with
    something like `$' or `//' that closes the rest of the sexpr in
    its own set of parens.

  * Every once in a while there's a suggestion to invert conversion
    functions, eg, turn `string->number' into `number<-string' so it
    reads out better.  In a similar direction, there are sometimes
    suggestions to use `compose' to make things more readable, as in
      ((compose f1 f2 f3 f4) x)
    vs
      (f1 (f2 (f3 (f4 x))))
    and the textual mess that the latter tends to end up as with real
    names.

  * srfi-2 defines an `and-let*' which is addressing a common pattern
    of interleaving nested `let's and `and's.  Actually, `cond' itself
    is addressing this kind of problem too, so add here various
    suggestions for extending `cond' with binders, anaphoric forms
    etc.

  * Recently, I looked at some clojure pages (to hunt for new
    extensions to `racket/list'), and I saw that they have a
    "threading form" using `->' that expresses nested function calls.
    See this here:
      http://clojuredocs.org/clojure_core/clojure.core/-%3E
    and note also the other three variants, `->>' `-?>' and `-?>>',

  * (The list goes on...)

(One common theme in all of these is that they're tools that none of
them are tools that are needed -- they're all just ways to make code
look better.)

I actually started thinking about this when I saw the clojure thing.
The first thing that is limited about it is that it has four forms,
where the reason for the `->' vs `->>' split is to put the nesting in
a different argument position.  To summarize (and IIUC):

  (-> x
      (foo 1 2)
      (bar y))

expands to

  (bar (foo x 1 2) y)

whereas using a `->>' would make it expand to

  (bar y (foo 1 2 x))

Not only does it seem to me bad to have two bindings for this, we also
have the usual problem of the order-defying `regexp-replace' where
usually the action happens in the *middle* argument...  (Which is how
it ends up being a common example in showing these problems, as
happened recently.)

In any case, this looks like an easy thing to fix by adding an
explicit marker to the point where the nesting happens.  For example,
imagine a form that looks like this:

  (○ x
     (foo 1 <> 2)
     (bar y <>))

that expands to (bar y (foo 1 x 2)).  (The reason that clojure has two
other forms (`-?>' and `-?>>') is something that is related to the
below, so I'll skip it for now.)

The next thing that I tried is to contrast this with `nest'.  The
difference between them is that while both lead to a simpler syntax
for nested expressions, they do the nesting in different directions,
where (*very* roughly speaking) `->' nests things downwards and `nest'
nests them upwards:

  (-> X Y)    nests X into Y
  (nest X Y)  nests Y into X

or more generally:

  (-> X Y0 Y ...) nests X into Y0 and nests the results with Y ...
  (nest X Y ...)  nests the result of nesting Y ... into X

So I tried to see if I can come up with something that can kill both
birds -- which is why I started with the above example:

  (let ([str (string-trim (substring "foo bar baz" 3 8))])
    (and (regexp-match? #rx"^[a-z].*[a-z]$" str)
         (string-append "*" str "*")))

Now, lets imagine that instead of a simple `<>' hole, there are two
kinds of holes with an "up" or a "down" direction -- this leads to
this kind of a syntax:

  (○ "foo bar baz"
     (substring ↑ 3 8)
     (string-trim ↑)
     (let ([str ↑]) ↓)
     (and (regexp-match? #rx"^[a-z].*[a-z]$" str) ↓)
     (string-append "*" str "*"))

where you can read `↑' as "the above" and `↓' as "the below".  The
thing that makes me excited about this is how you can read this as the 
above [*] reading.

There are still some problems with this though.  One problem is that
it can be ambiguous -- for example, I had this as one experiement:

  (○ (let ([str "foo bar baz"]) ↓)
     (substring str 3 8)
     (string-trim ↑)
     (string-append "*" ↑ "*"))

where the upward nesting could happen first -- this ambiguity is easy
to resolve if there's a simple rule for merging the first two
expressions repeatedly, stopping with an error if there's not exactly
one down arrow in the first or one up arrow in the second; and
finishing when there's one expression (throwing an error if it still
has arrows).  Using this, the expansion of the above goes with these
steps:

  ... ->
  (○ (let ([str "foo bar baz"]) (substring str 3 8))
     (string-trim ↑)
     (string-append "*" ↑ "*"))
  ->
  (○ (string-trim (let ([str "foo bar baz"]) (substring str 3 8)))
     (string-append "*" ↑ "*"))
  ->
  (○ (string-append "*" (string-trim (let ([str "foo bar baz"]) (substring str 3 8))) "*"))
  ->
  (string-trim (let ([str "foo bar baz"]) (substring str 3 8)))

It's also unclear if this is generic enough though.  I vaguely suspect
that there might be cases where you want arrows from multiple places
in the form which makes this a kind of a literate-programming-like
tool for micro-level code organization (and yes, I intensely dislike
LP, so that's would be a bad thing).  In addition, something like this
should really have simple rules for how it works, otherwise it not
something that anyone would want to use or read.

BTW, I take the `nest' experiment as an example: the form itself is,
IMO, perfectly fine, but it suffered from having too much parentheses,
which makes it hard to use.  One thing I like in the above is that the
explicit arrow markers make it much easier to read -- I think that
this is also an advantage over the clojure threading forms, where you
see a form like (take 10) and you have to look back at the arrow kind
that was used to know what this really is.

In any case, any thoughts about this?  I'd especially appreciate
little code layout horrors you might encounter, to see how such a form
can deal with them.  Feel free to reply off-list to avoid premature
bike-shedding.  (I'm *not* going to commit anything -- this is just
trying to roll around the idea to see if there's any point in doing
something like this.  *If* there is enough interest, then I'll post a
concrete suggestion when I have one.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.