[racket-dev] Proposal for a "no-argument"

From: Eli Barzilay (eli at barzilay.org)
Date: Sun Jul 1 09:27:00 EDT 2012

There rare cases where it is useful to have a value that means that no
argument was passed to a function.  In many of these cases there is a
plain value that is used as that mark, with the most idiomatic one
being #f, but sometimes others are used.  IMO, while such uses of #f
are idiomatic, they're a hack where an argument's domain is extended
only to mark "no argument".

A more robust way to do that, which has become idiomatic in Racket is
to use (gensym).  (And as a sidenote, in other implementations there
are various similar eq-based hacks.)  IMO, this is an attempt to
improve on the #f case by guaranteeing a unique value, but at its core
it's still a similar hack.

Recently, I have extended the `add-between' function in a way that ran
against this problem at the interface level, where two keyword
arguments default to such a gensymed value to detect when no argument
is passed.  Natually, this "leaked" into the documentation in the form
of using `....' to avoid specifying the default value and instead
talking about what happens when no argument is given for the keywords
in question.

After a short discussion that I had with Matthew, the new version uses
a new keyword that holds the unique no-value value, to simplify

    (define (foo x #:nothing [nothing (gensym)] [y nothing])
      (printf "y is ~s\n" (if (eq? y nothing) 'omitted y)))

The idea is that this does not depend on some specific unique value,
since one can be given.  For "end-users" of the function, there is no
need to know about this.  It's useful only for wrapper functions which
want to mirror the same behavior, and call `foo' in a way that makes
their own input passed to it, including not passing it when its own
input is missing.  In this case, you'd do something like this:

    (define (bar #:nothing [nothing (gensym)] [x nothing])
      (foo 10 x #:nothing nothing))

This works, but I dislike this solution for several reasons:

1. Instead of finding a solution for the `gensym' problem, this
   approach embraces it as the proper way to do such things.

2. But more than that, it also exposes it in the interface of such
   functions, which means that "simple end users" need to read about
   it too.  There is no easy way to somehow say "you souldn't worry
   about this unless you're writing a function that ...", and if you
   look at the current docs for `add-between' you'd probably wonder
   when that #:nothing is useful.

3. There is also a half-story in this documentation -- even though the
   docs look like the above function definition, you obviously would
   want to define a single global gensymmed value and use it, to avoid
   redundant allocation.  By the way the docs read, the above does
   look like the way to do these things, and I can see how a quick
   reading would make people believe that it's fine to write:

     (define (foo)
       (define (bar [x (gensym)])
       ... call bar many times ...)

I considered a bunch of alternatives to this, and the one closest to
looking reasonable is to use the #<undefined> value: it makes some
sense because it is a value that is already used in some "no value"
cases.  However, it is probably a bad idea to use it for something
different.  In fact, that's how many languages end up with false,
null, undefined, etc etc.

(As a side note, a different approach would be to use a per-argument
boolean flag that specifies if the corresponding argument.  Since this
started with a documentation point of view, I'm assuming that it won't
be a good solution since it doesn't solve that problem -- a function
that uses it similarly to `add-between' would still need to avoid
specifying the default.)

Instead, I suggest using a new "special" value, one that is used only
for this purpose.  The big difference from all of these special values
is that I'm proposing a value that is used only for this.  To
"discourage" using it for other reasons, there would be no binding for
it.  Instead, there would be a fake one, say `no-argument', which is
used only as a syntax in a default expression and only there the real
no-argument gets used -- so the value is actually hidden and
`no-argument' is a syntactic marker that is otherwise an error to use,
like `else' and `=>'.  (I'm no even suggesting making it a syntax
parameter that has a value in default expressions, because you
shouldn't be able to write (λ ([x (list no-argument)]) ...).)  The
only real binding that gets added is something that identifies that
value, or even more convenient, something like `given?' that checks
whether its input is *not* that value.

To demonstrate how this looks like, assume that the kernel has only a
three-argument `kernel-hash-ref', and you want to implement `hash-ref'
on top of it without using a thunk (which avoid the problem in a
different way).  The so-far-idiomatic code could be as follows:

    (define none (gensym)) ; private binding
    (define (hash-ref t k [d none])
      (cond [(not (eq? d none)) (kernel-hash-ref t k d)]
            [(not (has-key? t k)) (error "no such key")]
            [else (kernel-hash-ref t k 'whatever)]))

Using the new idiom, it would be:

    (define default-nothing (gensym)) ; private binding
    (define (hash-ref t k #:nothing [nothing default-nothing] [d nothing])
      (cond [(not (eq? d nothing)) (kernel-hash-ref t k d)]
            [(not (has-key? t k)) (error "no such key")]
            [else (kernel-hash-ref t k 'whatever)]))

And using my suggestion:

    (define (hash-ref t k [d no-argument])
      (cond [(given? d) (kernel-hash-ref t k d)]
            [(not (has-key? t k)) (error "no such key")]
            [else (kernel-hash-ref t k 'whatever)]))

Note that the code is essentially the same, only now there's no need
for the gensym hack or any of the similar things, and the interface to
the function is back to its uncluttered form.  The documentation would
use `no-argument', which would be linked to a description of how/when
to use it when people need to do so, while for most people the
description is clear from the name.

Two notes:

1. You might notice that there's no real need for the fake binding
   since it's just syntax.  For example, the same could be done with
   just dropping the default expression, as in

     (define (hash-ref t k [d]) ...)

   Keeping the binding there is useful since the syntax is still the
   same (eg, macros don't need to change since it looks like just
   another expression), and in case it is later desirable, it's easy
   to replace it by an actual binding (see below).

2. Unlike the #:nothing keyword, this is not 100% robust, since you
   *can* grab the actual value and pass it.  For example:

     (hash-ref t k ((λ ([x no-argument]) x)))

   I think that this is not a problem in practice, at least no a new
   one, since it's essentially the same situation as with
   #<undefined>, where it is possible to write an expression that
   evaluates to it.

   In addition, it is not 100% robust in that you can write broken
   code like:

     (define (hash-ref t k [d no-argument])
       (kernel-hash-ref t k d))

   and end up with the no-argument value as a result.  But this is
   also not a new problem, since the same mistake can apply to the
   other cases too.



Like I said, further on, there is the possibility of having a proper
`no-value' binding.  There are some obvious uses for it (eg, call a
function with two optional values and specify only the last); these
uses are arguably ones that should use keywords instead, but maybe
there's still a point for older functions.  Regardless of whether it's
a good idea or not, I think that if the need comes, then it could be
implemented similarly to a `Maybe' thing.  To abuse the names from a
different language, add a `some' constructor that is optional for any
value other than no-value, which makes it a kind of a quotation -- so
inputs like `5' are the same, `no-value' is the same as above, and
(some x) is the same as x for any value, including `no-value'.  (And
functions would need to check if it's a `some?'.)  But that's just a
random though, not related to the above in any way other than showing
how that syntax leaves this possibility open.


          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.