[racket-dev] Comparison functions and the `data' collection
On 06/11/2012 02:36 PM, Eli Barzilay wrote:
> Yesterday, Danny Yoo wrote:
>>
>> It's a little unfortunate that there's a slight impedance mismatch
>> between what datum-order provides and what sort expects; the
>> my-less-than function in the example adapts the output of
>> datum-order so it can be used with sort.
>
> Thanks for pointing it out (I didn't know about it). When I looked
> into this, I saw that in addition to `data/order' there is also an
> issue with `data/heap'. It certainly does look like a problem worth
> addressing.
>
> The thing is that are two common ways to specify comparison functions:
> (a) three-valued comparison functions returning -1/0/+1, (b) a boolan
> strictly-smaller-than predicate. The first is common in several
> mainsteam languages and the second is traditionally common in lisps
> (which leads to its use in the sort function).
>
> The issues are described very thoroughly in srfi-67 -- specifically,
> see the first two subsections in section 6. I highly recommend
> reading that for this discussion. They decide to go with option (a),
> with an explanation for the choice of a three-valued function and for
> their choice of -1/0/+1 values.
I don't remember if I discovered srfi-67 before or after I added
data/order. In any case, I disagree with its rationale for -1/0/+1: I
dislike the idea of performing arithmetic on orderings. I also prefer
the pattern
(case (compare a b)
[(<) ...]
[(=) ...]
[(>) ...])
to the corresponding version with numbers.
OTOH, data/order is used primarily by the ordered dictionary types
(data/splay-tree, data/skip-list), so it's probably missing operations
and conveniences for tasks like sorting a list.
> Very briefly, each of the two options has an advantage: the first
> makes some cases faster since a single function call returns the
> relation between the two whereas the second requires two calls to
> distinguish equivalence from bigger-than. The second has an advantage
> of being more convenient. (Note that with `#:cache-keys?', there's
> probably some cases where it's easy to speed things up using some
> preprocessing, but probably not in all cases.)
>
> So I see three problems here:
>
> 1. There might be a need for a variant of `sort' that is based on a
> three-valued predicate. That's certainly doable, but I don't
> remember anyone needing this in practice yet. Furthermore, doing
> this kind of thing in such a core-ish function seems questionable
> without more support for 3-valued comparisons -- that is, without
> such actual functions that compare numbers, strings etc. So
> perhaps this is better done in some additional library.
>
> 2. The `data/order' interface has several problems that I think is
> best to resolve.
>
> - Regadless of the above, it seems like a good idea to extend the
> interface with a simple boolean predicate. Maybe something like
> `datum<?' and possibly others. This would address the issue that
> Danny raised above.
From an order you can get the less-than and equality predicates:
(order-<? datum-order)
(order-=? datum-order)
> - I also think that it's not a great choice to require two
> functions for creating orders when in many cases a single boolean
> valued function can work fine. Providing a way to create an
> order with a single function would make it easy to translate a
> boolean predicate into an order. Is there any reason to require
> an explicit equality?
No, I can add 'order-from-<?' and 'order-from-<=?'. As you point out
below, it would be difficult to add these options the 'order' function
itself, though.
> I don't know what would be a good way to do that. AFAICT, this
> is not something that can fit the current interface because the
> equality is the first argument -- if it was the second, then it
> could be made optional. But in addition to that I see at least
> one place (`order') where a single input function is assumed to
> be a comparator. There could be a function that translates a
> boolean predicate to a comparator but IMO it's very confusing as
> is, that you can't just hand it over directly.
>
> - Another problem is the choice of '< '= and'> instead of the much
> more popular -1 0 +1. This could be hand-waved away if it wasn't
> in actual use in our neighborhood, but there is already srfi-67,
> and I think that `soegaard/galore' on planet uses that too, and
> generally it seems like a bad idea to go with something different
> that leads to a constant need for translating values back and
> forth. This comes in addition to the actual reasons for using
> the integer values which are listed in the srfi (like using the
> result as an index offset, getting a reversed order with a simple
> negation, etc).
See above.
> 3. When I looked for mentions of sorting in the data collection, I
> noticed `data/heap': this code *does* use a binary comparison, but
> it's expecting a `<=?' instead of a `<?'. This makes it
> incompatible with both (the current) `data/order' and with `sort'.
> Is there a particular reason for this odd choice? (The best guess
> I have is that this is a translation of some internal code?)
IIRC, data/heap came before data/order and hasn't been updated.
> In addition to that, there is the order of inputs to `heap-sort!'
> which contradicts the order of inputs to `sort', which again, seems
> like an inconsistent interface.
Yes, that just seems like a mistake.
Ryan