[racket-dev] Comparison functions and the `data' collection
Yesterday, Danny Yoo wrote:
>
> It's a little unfortunate that there's a slight impedance mismatch
> between what datum-order provides and what sort expects; the
> my-less-than function in the example adapts the output of
> datum-order so it can be used with sort.
Thanks for pointing it out (I didn't know about it). When I looked
into this, I saw that in addition to `data/order' there is also an
issue with `data/heap'. It certainly does look like a problem worth
addressing.
The thing is that are two common ways to specify comparison functions:
(a) three-valued comparison functions returning -1/0/+1, (b) a boolan
strictly-smaller-than predicate. The first is common in several
mainsteam languages and the second is traditionally common in lisps
(which leads to its use in the sort function).
The issues are described very thoroughly in srfi-67 -- specifically,
see the first two subsections in section 6. I highly recommend
reading that for this discussion. They decide to go with option (a),
with an explanation for the choice of a three-valued function and for
their choice of -1/0/+1 values.
Very briefly, each of the two options has an advantage: the first
makes some cases faster since a single function call returns the
relation between the two whereas the second requires two calls to
distinguish equivalence from bigger-than. The second has an advantage
of being more convenient. (Note that with `#:cache-keys?', there's
probably some cases where it's easy to speed things up using some
preprocessing, but probably not in all cases.)
So I see three problems here:
1. There might be a need for a variant of `sort' that is based on a
three-valued predicate. That's certainly doable, but I don't
remember anyone needing this in practice yet. Furthermore, doing
this kind of thing in such a core-ish function seems questionable
without more support for 3-valued comparisons -- that is, without
such actual functions that compare numbers, strings etc. So
perhaps this is better done in some additional library.
2. The `data/order' interface has several problems that I think is
best to resolve.
- Regadless of the above, it seems like a good idea to extend the
interface with a simple boolean predicate. Maybe something like
`datum<?' and possibly others. This would address the issue that
Danny raised above.
- I also think that it's not a great choice to require two
functions for creating orders when in many cases a single boolean
valued function can work fine. Providing a way to create an
order with a single function would make it easy to translate a
boolean predicate into an order. Is there any reason to require
an explicit equality?
I don't know what would be a good way to do that. AFAICT, this
is not something that can fit the current interface because the
equality is the first argument -- if it was the second, then it
could be made optional. But in addition to that I see at least
one place (`order') where a single input function is assumed to
be a comparator. There could be a function that translates a
boolean predicate to a comparator but IMO it's very confusing as
is, that you can't just hand it over directly.
- Another problem is the choice of '< '= and '> instead of the much
more popular -1 0 +1. This could be hand-waved away if it wasn't
in actual use in our neighborhood, but there is already srfi-67,
and I think that `soegaard/galore' on planet uses that too, and
generally it seems like a bad idea to go with something different
that leads to a constant need for translating values back and
forth. This comes in addition to the actual reasons for using
the integer values which are listed in the srfi (like using the
result as an index offset, getting a reversed order with a simple
negation, etc).
3. When I looked for mentions of sorting in the data collection, I
noticed `data/heap': this code *does* use a binary comparison, but
it's expecting a `<=?' instead of a `<?'. This makes it
incompatible with both (the current) `data/order' and with `sort'.
Is there a particular reason for this odd choice? (The best guess
I have is that this is a translation of some internal code?)
In addition to that, there is the order of inputs to `heap-sort!'
which contradicts the order of inputs to `sort', which again, seems
like an inconsistent interface.
(And a nitpick: the contract for `heap-sort!' uses `procedure?'
where other functions in this library use (-> any/c any/c any/c).)
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!