[racket-dev] Comparison functions and the `data' collection

From: Ryan Culpepper (ryan at cs.utah.edu)
Date: Mon Jun 11 17:40:24 EDT 2012

On 06/11/2012 02:36 PM, Eli Barzilay wrote:
> Yesterday, Danny Yoo wrote:
>>
>> It's a little unfortunate that there's a slight impedance mismatch
>> between what datum-order provides and what sort expects; the
>> my-less-than function in the example adapts the output of
>> datum-order so it can be used with sort.
>
> Thanks for pointing it out (I didn't know about it).  When I looked
> into this, I saw that in addition to `data/order' there is also an
> issue with `data/heap'.  It certainly does look like a problem worth
> addressing.
>
> The thing is that are two common ways to specify comparison functions:
> (a) three-valued comparison functions returning -1/0/+1, (b) a boolan
> strictly-smaller-than predicate.  The first is common in several
> mainsteam languages and the second is traditionally common in lisps
> (which leads to its use in the sort function).
>
> The issues are described very thoroughly in srfi-67 -- specifically,
> see the first two subsections in section 6.  I highly recommend
> reading that for this discussion.  They decide to go with option (a),
> with an explanation for the choice of a three-valued function and for
> their choice of -1/0/+1 values.

I don't remember if I discovered srfi-67 before or after I added 
data/order. In any case, I disagree with its rationale for -1/0/+1: I 
dislike the idea of performing arithmetic on orderings. I also prefer 
the pattern

   (case (compare a b)
     [(<) ...]
     [(=) ...]
     [(>) ...])

to the corresponding version with numbers.

OTOH, data/order is used primarily by the ordered dictionary types 
(data/splay-tree, data/skip-list), so it's probably missing operations 
and conveniences for tasks like sorting a list.

> Very briefly, each of the two options has an advantage: the first
> makes some cases faster since a single function call returns the
> relation between the two whereas the second requires two calls to
> distinguish equivalence from bigger-than.  The second has an advantage
> of being more convenient.  (Note that with `#:cache-keys?', there's
> probably some cases where it's easy to speed things up using some
> preprocessing, but probably not in all cases.)
>
> So I see three problems here:
>
> 1. There might be a need for a variant of `sort' that is based on a
>     three-valued predicate.  That's certainly doable, but I don't
>     remember anyone needing this in practice yet.  Furthermore, doing
>     this kind of thing in such a core-ish function seems questionable
>     without more support for 3-valued comparisons -- that is, without
>     such actual functions that compare numbers, strings etc.  So
>     perhaps this is better done in some additional library.
>
> 2. The `data/order' interface has several problems that I think is
>     best to resolve.
>
>     - Regadless of the above, it seems like a good idea to extend the
>       interface with a simple boolean predicate.  Maybe something like
>       `datum<?' and possibly others.  This would address the issue that
>       Danny raised above.

 From an order you can get the less-than and equality predicates:

   (order-<? datum-order)
   (order-=? datum-order)

>     - I also think that it's not a great choice to require two
>       functions for creating orders when in many cases a single boolean
>       valued function can work fine.  Providing a way to create an
>       order with a single function would make it easy to translate a
>       boolean predicate into an order.  Is there any reason to require
>       an explicit equality?

No, I can add 'order-from-<?' and 'order-from-<=?'. As you point out 
below, it would be difficult to add these options the 'order' function 
itself, though.

>       I don't know what would be a good way to do that.  AFAICT, this
>       is not something that can fit the current interface because the
>       equality is the first argument -- if it was the second, then it
>       could be made optional.  But in addition to that I see at least
>       one place (`order') where a single input function is assumed to
>       be a comparator.  There could be a function that translates a
>       boolean predicate to a comparator but IMO it's very confusing as
>       is, that you can't just hand it over directly.
>
>     - Another problem is the choice of '<  '= and'>  instead of the much
>       more popular -1 0 +1.  This could be hand-waved away if it wasn't
>       in actual use in our neighborhood, but there is already srfi-67,
>       and I think that `soegaard/galore' on planet uses that too, and
>       generally it seems like a bad idea to go with something different
>       that leads to a constant need for translating values back and
>       forth.  This comes in addition to the actual reasons for using
>       the integer values which are listed in the srfi (like using the
>       result as an index offset, getting a reversed order with a simple
>       negation, etc).

See above.

> 3. When I looked for mentions of sorting in the data collection, I
>     noticed `data/heap': this code *does* use a binary comparison, but
>     it's expecting a `<=?' instead of a `<?'.  This makes it
>     incompatible with both (the current) `data/order' and with `sort'.
>     Is there a particular reason for this odd choice?  (The best guess
>     I have is that this is a translation of some internal code?)

IIRC, data/heap came before data/order and hasn't been updated.

>     In addition to that, there is the order of inputs to `heap-sort!'
>     which contradicts the order of inputs to `sort', which again, seems
>     like an inconsistent interface.

Yes, that just seems like a mistake.

Ryan

Posted on the dev mailing list.