[plt-scheme] mergesort stability

From: Felix PLT Proxy (pltscheme at pnkfx.org)
Date: Thu Jul 1 18:01:17 EDT 2004

PLTers-

In a recent bit of hacking, I found that I required a stable sort 
implementation.

Neither of MzLib's provided sorting functions (QUICKSORT, MERGESORT) are 
guaranteed to be stable, so I started poking around on the web.

But then I was reminded by some webpage that the Merge Sort algorithm 
typically *is* stable.  So I asked myself, why isn't MzScheme's?

After staring at the MERGESORT implementation for a bit, and drawing a 
couple of behavioral traces, I realized why MzScheme's implementation is 
not guaranteed to be stable: because the sorting function doesn't know 
whether its been passed a reflexive comparator or not.

There is a crucial bit of code where we actually *run* the LESS-THAN 
parameter, and we have to decide what we will do when it returns #f.

This is what the cond-clauses look like now (L1 and L2 are the 
intermediate lists that we are merging, and R is the result accumulator)

            ...
              [(less-than (car l1) (car l2))
               (merge (cdr l1) l2 (cons (car l1) r))]
              [else (merge (cdr l2) l1 (cons (car l2) r))]))]
            ...

(it builds up R in reversed order and then reverse it at the end; that's 
why we cons the lesser element onto R).

If LESS-THAN is bound to a non-reflexive comparator, this code will 
produce a stable sort.  If LESS-THAN is bound to a reflexive comparator, 
then the sort is unstable.  The heart of the problem is that the code 
doesn't know when its looking at "equal modulo less-than" elements.
 
Example (notice that the numbers are ordered (w.r.t. their char) in the 
original input list):

Welcome to MzScheme version 206, Copyright (c) 2004 PLT Scheme, Inc.
 > (require (lib "list.ss"))
 > (mergesort '((#\D 1) (#\E 0) (#\B 1) (#\A 1) (#\D 2) (#\C 0) (#\B 2) 
(#\D 3) (#\A 2)) (lambda (x y) (char<=? (car x) (car y))))
((#\A 2) (#\A 1) (#\B 1) (#\B 2) (#\C 0) (#\D 3) (#\D 2) (#\D 1) (#\E 0))
 > (mergesort '((#\D 1) (#\E 0) (#\B 1) (#\A 1) (#\D 2) (#\C 0) (#\B 2) 
(#\D 3) (#\A 2)) (lambda (x y) (char<? (car x) (car y))))
((#\A 1) (#\A 2) (#\B 1) (#\B 2) (#\C 0) (#\D 1) (#\D 2) (#\D 3) (#\E 0))

The CHAR<? function's output is stable, while that of CHAR<=? is not.

I discussed the problem with Joe and John and Phillippe, and there are 
definitely solutions to this.  One is to actually pass in a seperate 
equivalence relation; when the user wants a stable sort, they are 
responsible for understanding what their equivalence relation is.

Another solution is to infer what the equivalence relation is from the 
comparator itself.  You can actually do this for both reflexive and 
anti-reflexive comparators, if you're clever.  (That is, you can write a 
simple function that will map both CHAR<=? and CHAR<? to something 
that's semantically equivalent to CHAR=?, by doing an (AND OR NOR)).  I 
don't know about comparators which are neither reflexive nor 
anti-reflexive though...

Joe came up with the following clever solution (which at its heart uses 
the above and-or-nor trick; its just implicit in the control flow).  It 
is a pretty simple change to the MERGESORT implementation, and I believe 
that it produces a stable sort.  Please note the changes to the 
cond-clauses excerpted above.  Remember that we're building up the list 
in reverse-order, which makes the code a little non-intuitive.

  (define mergesort ;; /stable
    (polymorphic
     (lambda (alox less-than)
       (letrec ([split (lambda (alox r)
             (cond
              [(null? alox) r]
              [(null? (cdr alox)) (cons alox r)]
              [else (split (cdr alox) (cons (list (car alox)) r))]))]
        [merge (lambda (l1 l2 r)
             (cond
              [(null? l1) (append! (reverse! r) l2)]
              [(null? l2) (append! (reverse! r) l1)]
              ;; Changes to cond-clauses start here
              [(less-than (car l2) (car l1))  
               (merge (cdr l2) l1 (cons (car l2) r))]
              [(less-than (car l1) (car l2))
               (merge (cdr l1) l2 (cons (car l1) r))]
              [else
                 ;; when unconstrained, revert to original order
                 ;; for stability.
               (merge (cdr l2) l1 (cons (car l2) r))]))]
               ;; Changes to cond-clauses end here
        [map2 (lambda (l)
            (cond
             [(null? l) '()]
             [(null? (cdr l)) l]
             [else (cons (merge (car l) (cadr l) null)
                     (map2 (cddr l)))]))]
        [until (lambda (l)
             (if (null? (cdr l))
                 (car l)
                 (until (map2 l))))])
     (if (null? alox)
         null
         (until (split alox null)))))))


I would like the official MzLib version to be changed to a stable sort; 
it can be this, or something else, but we should provide *something*. 
 Everyone who doesn't need stability uses quicksort anyway, so the 
slight efficiency hit (calling the comparator twice) shouldn't be an 
issue.  I shouldn't have to go track down Olin Shiver's (non-finalized) 
SRFI-32 implementation just to get a stable-sort going.

-Felix




Posted on the users mailing list.