[racket-dev] Comparison functions and the `data' collection

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Jun 11 16:36:32 EDT 2012

Yesterday, Danny Yoo wrote:
> 
> It's a little unfortunate that there's a slight impedance mismatch
> between what datum-order provides and what sort expects; the
> my-less-than function in the example adapts the output of
> datum-order so it can be used with sort.

Thanks for pointing it out (I didn't know about it).  When I looked
into this, I saw that in addition to `data/order' there is also an
issue with `data/heap'.  It certainly does look like a problem worth
addressing.

The thing is that are two common ways to specify comparison functions:
(a) three-valued comparison functions returning -1/0/+1, (b) a boolan
strictly-smaller-than predicate.  The first is common in several
mainsteam languages and the second is traditionally common in lisps
(which leads to its use in the sort function).

The issues are described very thoroughly in srfi-67 -- specifically,
see the first two subsections in section 6.  I highly recommend
reading that for this discussion.  They decide to go with option (a),
with an explanation for the choice of a three-valued function and for
their choice of -1/0/+1 values.

Very briefly, each of the two options has an advantage: the first
makes some cases faster since a single function call returns the
relation between the two whereas the second requires two calls to
distinguish equivalence from bigger-than.  The second has an advantage
of being more convenient.  (Note that with `#:cache-keys?', there's
probably some cases where it's easy to speed things up using some
preprocessing, but probably not in all cases.)

So I see three problems here:

1. There might be a need for a variant of `sort' that is based on a
   three-valued predicate.  That's certainly doable, but I don't
   remember anyone needing this in practice yet.  Furthermore, doing
   this kind of thing in such a core-ish function seems questionable
   without more support for 3-valued comparisons -- that is, without
   such actual functions that compare numbers, strings etc.  So
   perhaps this is better done in some additional library.

2. The `data/order' interface has several problems that I think is
   best to resolve.

   - Regadless of the above, it seems like a good idea to extend the
     interface with a simple boolean predicate.  Maybe something like
     `datum<?' and possibly others.  This would address the issue that
     Danny raised above.

   - I also think that it's not a great choice to require two
     functions for creating orders when in many cases a single boolean
     valued function can work fine.  Providing a way to create an
     order with a single function would make it easy to translate a
     boolean predicate into an order.  Is there any reason to require
     an explicit equality?

     I don't know what would be a good way to do that.  AFAICT, this
     is not something that can fit the current interface because the
     equality is the first argument -- if it was the second, then it
     could be made optional.  But in addition to that I see at least
     one place (`order') where a single input function is assumed to
     be a comparator.  There could be a function that translates a
     boolean predicate to a comparator but IMO it's very confusing as
     is, that you can't just hand it over directly.

   - Another problem is the choice of '< '= and '> instead of the much
     more popular -1 0 +1.  This could be hand-waved away if it wasn't
     in actual use in our neighborhood, but there is already srfi-67,
     and I think that `soegaard/galore' on planet uses that too, and
     generally it seems like a bad idea to go with something different
     that leads to a constant need for translating values back and
     forth.  This comes in addition to the actual reasons for using
     the integer values which are listed in the srfi (like using the
     result as an index offset, getting a reversed order with a simple
     negation, etc).

3. When I looked for mentions of sorting in the data collection, I
   noticed `data/heap': this code *does* use a binary comparison, but
   it's expecting a `<=?' instead of a `<?'.  This makes it
   incompatible with both (the current) `data/order' and with `sort'.
   Is there a particular reason for this odd choice?  (The best guess
   I have is that this is a translation of some internal code?)

   In addition to that, there is the order of inputs to `heap-sort!'
   which contradicts the order of inputs to `sort', which again, seems
   like an inconsistent interface.

   (And a nitpick: the contract for `heap-sort!' uses `procedure?'
   where other functions in this library use (-> any/c any/c any/c).)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.