[racket] Looking for feedback on code style
David,
Looks like Phil beat me to it, but here's some code that finds the n-th element of a list in O(N) time. The algorithm is similar to quicksort, but you don't sort both the sub-lists: partition the list into elements less than and greater or equal to a pivot. By counting the number of elements less than the pivot, you know whether the one you're looking for will be found in that set, or in the greater or equal set. Repeat the search in the appropriate set. The partition takes O(N) time. On average, it cuts the number of elements searched in half each time, so we have
total time = N + N/2 + N/4 + ... = 2*N as N --> infty
The following code isn't exactly tuned for speed, but the asymptotic behavior is correct:
#lang racket
(require srfi/67)
(define compare-procedure/c
(-> any/c any/c (integer-in -1 1)))
(provide/contract
(find-nth (-> list? natural-number/c compare-procedure/c any))
(median (-> (listof real?) number?))) ;; Median requires real, since you can't order complex numbers.
(define (all-equal? list compare)
(or (null? list)
(null? (cdr list))
(let ((head (car list)))
(andmap (lambda (elt) (=? compare head elt)) (cdr list)))))
(define (find-nth list i compare)
(when (null? list)
(error 'find-nth "empty list"))
(cond
((null? (cdr list))
(if (= i 0)
(car list)
(error 'find-nth "index out of range")))
((all-equal? list compare)
(car list))
(else
(let ((N (length list)))
(when (>= i N)
(error 'find-nth "index out of range"))
(let ((pivot (list-ref list (random N))))
(let-values (((lt gte)
(partition (lambda (elt) (<? compare elt pivot)) list)))
(let ((N-lt (length lt)))
(if (< i N-lt)
(find-nth lt i compare)
(find-nth gte (- i N-lt) compare)))))))))
(define (median list)
(let ((N (length list)))
(if (even? N)
(let ((N/2 (/ N 2)))
(/ (+ (find-nth list (sub1 N/2) real-compare)
(find-nth list N/2 real-compare))
2))
(find-nth list (floor (/ N 2)) real-compare))))
And a test suite:
#lang racket
(require rackunit
"median.rkt"
srfi/67)
(define (random-list N)
(build-list N (lambda (i) (random))))
;; (find-nth list i compare) should be the same (but asymptotically
;; faster) than (list-ref (sort list (<? compare ...)) i).
(for ((i (in-range 100)))
(let* ((l (random-list (+ (random 100) 20)))
(sorted-l (sort l (lambda (x y) (<? real-compare x y)))))
(let ((i (random (length l))))
(check-equal? (list-ref sorted-l i)
(find-nth l i real-compare)))))
;; Throw errors on index out of range and empty list?
(check-true (with-handlers ((exn:fail? (lambda (exn) #t)))
(find-nth '(1 2 3 4 5) 6 default-compare)))
(check-true (with-handlers ((exn:fail? (lambda (exn) #t)))
(find-nth '() 0 default-compare)))
(check-equal? (median '(1)) 1)
(check-equal? (median '(9 8 7 2 3 5 6)) 6) ;; Mathematica thinks this is correct.
(check-equal? (median '(9 8 7 2 3 5 6 10)) 13/2) ;; This, too.
Enjoy!
Will
On Sep 9, 2010, at 10:26 AM, David Van Horn wrote:
> On 9/9/10 10:04 AM, Prabhakar Ragde wrote:
>> I don't think vectors help very much in this case (median-finding). For
>> the given code, the O(n) access to the middle of the list is dominated
>> by the cost of the sorting, which is at least O(n log n) [*].
>>
>> It is theoretically possible to compute the median in O(n) time, but the
>> method is complicated and not very practical. But sorting definitely
>> does too much work. If only the median (or the kth largest, a problem
>> called "selection") is needed, a method which is both practical and of
>> pedagogical interest stems from adapting Quicksort, which is a good
>> exercise after section 25.2 of HtDP. This has expected cost O(n) on
>> random data, and vectors offer no asymptotic advantage over lists. --PR
>>
>> [*] Technically, "at least Omega(n log n)".
>
> The original post got me interested in median algorithms and I started to read up on the selection problem. Wikipedia (I know, I know) says the same thing as you: medians can be computed in O(n) time and points to selection as the way to do it. But I don't see how to use selection to achieve a O(n)-time median algorithm -- selection (of the kth largest/smallest element) is O(n), but that's were k is some fixed constant. To compute the median, you let k=n/2 (right?), so it's no longer constant. Can you point me to (or sketch) a O(n) method? Or just correct me if my reasoning is going astray.
>
> Thanks!
>
> David
> _________________________________________________
> For list-related administrative tasks:
> http://lists.racket-lang.org/listinfo/users