[plt-scheme] mergesort stability
PLTers-
In a recent bit of hacking, I found that I required a stable sort
implementation.
Neither of MzLib's provided sorting functions (QUICKSORT, MERGESORT) are
guaranteed to be stable, so I started poking around on the web.
But then I was reminded by some webpage that the Merge Sort algorithm
typically *is* stable. So I asked myself, why isn't MzScheme's?
After staring at the MERGESORT implementation for a bit, and drawing a
couple of behavioral traces, I realized why MzScheme's implementation is
not guaranteed to be stable: because the sorting function doesn't know
whether its been passed a reflexive comparator or not.
There is a crucial bit of code where we actually *run* the LESS-THAN
parameter, and we have to decide what we will do when it returns #f.
This is what the cond-clauses look like now (L1 and L2 are the
intermediate lists that we are merging, and R is the result accumulator)
...
[(less-than (car l1) (car l2))
(merge (cdr l1) l2 (cons (car l1) r))]
[else (merge (cdr l2) l1 (cons (car l2) r))]))]
...
(it builds up R in reversed order and then reverse it at the end; that's
why we cons the lesser element onto R).
If LESS-THAN is bound to a non-reflexive comparator, this code will
produce a stable sort. If LESS-THAN is bound to a reflexive comparator,
then the sort is unstable. The heart of the problem is that the code
doesn't know when its looking at "equal modulo less-than" elements.
Example (notice that the numbers are ordered (w.r.t. their char) in the
original input list):
Welcome to MzScheme version 206, Copyright (c) 2004 PLT Scheme, Inc.
> (require (lib "list.ss"))
> (mergesort '((#\D 1) (#\E 0) (#\B 1) (#\A 1) (#\D 2) (#\C 0) (#\B 2)
(#\D 3) (#\A 2)) (lambda (x y) (char<=? (car x) (car y))))
((#\A 2) (#\A 1) (#\B 1) (#\B 2) (#\C 0) (#\D 3) (#\D 2) (#\D 1) (#\E 0))
> (mergesort '((#\D 1) (#\E 0) (#\B 1) (#\A 1) (#\D 2) (#\C 0) (#\B 2)
(#\D 3) (#\A 2)) (lambda (x y) (char<? (car x) (car y))))
((#\A 1) (#\A 2) (#\B 1) (#\B 2) (#\C 0) (#\D 1) (#\D 2) (#\D 3) (#\E 0))
The CHAR<? function's output is stable, while that of CHAR<=? is not.
I discussed the problem with Joe and John and Phillippe, and there are
definitely solutions to this. One is to actually pass in a seperate
equivalence relation; when the user wants a stable sort, they are
responsible for understanding what their equivalence relation is.
Another solution is to infer what the equivalence relation is from the
comparator itself. You can actually do this for both reflexive and
anti-reflexive comparators, if you're clever. (That is, you can write a
simple function that will map both CHAR<=? and CHAR<? to something
that's semantically equivalent to CHAR=?, by doing an (AND OR NOR)). I
don't know about comparators which are neither reflexive nor
anti-reflexive though...
Joe came up with the following clever solution (which at its heart uses
the above and-or-nor trick; its just implicit in the control flow). It
is a pretty simple change to the MERGESORT implementation, and I believe
that it produces a stable sort. Please note the changes to the
cond-clauses excerpted above. Remember that we're building up the list
in reverse-order, which makes the code a little non-intuitive.
(define mergesort ;; /stable
(polymorphic
(lambda (alox less-than)
(letrec ([split (lambda (alox r)
(cond
[(null? alox) r]
[(null? (cdr alox)) (cons alox r)]
[else (split (cdr alox) (cons (list (car alox)) r))]))]
[merge (lambda (l1 l2 r)
(cond
[(null? l1) (append! (reverse! r) l2)]
[(null? l2) (append! (reverse! r) l1)]
;; Changes to cond-clauses start here
[(less-than (car l2) (car l1))
(merge (cdr l2) l1 (cons (car l2) r))]
[(less-than (car l1) (car l2))
(merge (cdr l1) l2 (cons (car l1) r))]
[else
;; when unconstrained, revert to original order
;; for stability.
(merge (cdr l2) l1 (cons (car l2) r))]))]
;; Changes to cond-clauses end here
[map2 (lambda (l)
(cond
[(null? l) '()]
[(null? (cdr l)) l]
[else (cons (merge (car l) (cadr l) null)
(map2 (cddr l)))]))]
[until (lambda (l)
(if (null? (cdr l))
(car l)
(until (map2 l))))])
(if (null? alox)
null
(until (split alox null)))))))
I would like the official MzLib version to be changed to a stable sort;
it can be this, or something else, but we should provide *something*.
Everyone who doesn't need stability uses quicksort anyway, so the
slight efficiency hit (calling the comparator twice) shouldn't be an
issue. I shouldn't have to go track down Olin Shiver's (non-finalized)
SRFI-32 implementation just to get a stable-sort going.
-Felix