[plt-scheme] help on how to write a frequency-counting function in a more functional way
Sigrid, just to get a first idea, I coded up the four most natural
solutions to this problem with a simple stress test and timed them:
-- vector
-- sort and count (PR's suggestion)
-- association list
-- binary search tree
The third one is so slow that I gave up on it. The vector is the base
line, sort-and-count is twice as slow, and BST is twice as slow as that.
What I didn't time is writing out a sorted file at the end from the
'association map' because I figured that I can do this in linear time
in all cases though I realize now that in the FP case, I get to
traverse the actual elements in the file in linear time while in the
Imperative case, I get to traverse ALL. Still, a vector traversal
should be fast enough.
I will leave it to you to time the rest. Report to the list what you
find -- Matthias
P.S. You can get even better FP solutions, though I doubt you get
close to the vector solution unless you have really strange
distributions of the frequencies.
#lang scheme
(require mzlib/etc)
(define HIGH 22000)
(define LOW 80)
;; Frequency is in [LOW ... HIGH)
(define SIZE 1000000)
;; String Nat -> Void
;; generate file f with n frequencies between 80 and 22000
(define (gen f n)
(define (freq) (+ LOW (random (- HIGH LOW))))
(with-output-to-file f
(lambda () (for ((i (in-range n))) (printf "~s " (freq))))
#:exists 'truncate))
(gen "tmp1" SIZE)
;; read file f and count how many times each frequency occurs in a
vector
;; String -> [Vectorof Nat]
;; imperative vector update
(define (cnt-vec f)
(define a (make-vector (- HIGH LOW)))
(define (up freq)
(define i (- freq LOW))
(vector-set! a i (+ (vector-ref a i) 1)))
(with-input-from-file f
(rec loop
(lambda ()
(define nxt (read))
(unless (eof-object? nxt) (up nxt) (loop)))))
a)
;; String -> [Listof [List Nat Nat]]
;; sort, then create association list
(define (cnt-alst f)
(define l:in
(with-input-from-file f
(rec L
(lambda ()
(define nxt (read))
(if (eof-object? nxt) '() (cons nxt (L)))))))
(define l:st (sort l:in <))
(let L ([l (cdr l:st)][p (car l:st)][c 1])
(if (null? l) '()
(let ([a (car l)])
(if (= a p)
(L (cdr l) p (+ c 1))
(cons (list p c) (L (cdr l) (car l) 1)))))))
;; String (-> Association) (Association Nat -> Association) ->
Association
;; create association list on the fly
(define (cnt-fp f nu up)
(with-input-from-file f
(lambda ()
(let L ([a (nu)])
(define nxt (read))
(if (eof-object? nxt) a (L (up a nxt)))))))
(define (cnt-AL f)
(cnt-fp f
(lambda () '())
(lambda (al freq)
(let L ((al al))
(if (null? al)
(list (list freq 1))
(let* ([a (car al)]
[key (car a)])
(if (= key freq)
(cons (list key (+ (cadr a) 1)) (cdr al))
(cons a (L (cdr al))))))))))
(define (cnt-BST f)
(define-struct node (lft info count rgt))
;; A BST is one of:
;; -- '()
;; -- (make-node BST Frequency Nat BST)
(cnt-fp f
(lambda () '())
(lambda (a freq)
(let L ([bst a])
(if (null? bst)
(make-node '() freq 1 '())
(let* ([a (node-info bst)]
[lft (node-lft bst)]
[rgt (node-rgt bst)]
[cnt (node-count bst)])
(cond
[(< freq a) (make-node (L lft) a cnt rgt)]
[(= freq a) (make-node lft a (+ cnt 1) rgt)]
[else (make-node lft a cnt (L rgt))])))))))
;;
------------------------------------------------------------------------
-----
(define (test f) (collect-garbage) (let ([r (time (f "tmp1"))]) (void)))
(test cnt-vec)
(test cnt-alst)
;; (test cnt-AL) ;;; takes too long for my taste
(test cnt-BST)