[plt-scheme] help on how to write a frequency-counting function in a more functional way

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Sat Apr 18 22:57:59 EDT 2009

Sigrid, just to get a first idea, I coded up the four most natural  
solutions to this problem with a simple stress test and timed them:

  -- vector
  -- sort and count (PR's suggestion)
  -- association list
  -- binary search tree

The third one is so slow that I gave up on it. The vector is the base  
line, sort-and-count is twice as slow, and BST is twice as slow as that.

What I didn't time is writing out a sorted file at the end from the  
'association map' because I figured that I can do this in linear time  
in all cases though I realize now that in the FP case, I get to  
traverse the actual elements in the file in linear time while in the  
Imperative case, I get to traverse ALL. Still, a vector traversal  
should be fast enough.

I will leave it to you to time the rest. Report to the list what you  
find -- Matthias

P.S. You can get even better FP solutions, though I doubt you get  
close to the vector solution unless you have really strange  
distributions of the frequencies.

#lang scheme

(require mzlib/etc)

(define HIGH 22000)
(define LOW     80)
;; Frequency is in [LOW ... HIGH)

(define SIZE 1000000)

;; String Nat -> Void
;; generate file f with n frequencies between 80 and 22000
(define (gen f n)
   (define (freq) (+ LOW (random (- HIGH LOW))))
   (with-output-to-file f
     (lambda () (for ((i (in-range n))) (printf "~s " (freq))))
     #:exists 'truncate))

(gen "tmp1" SIZE)

;; read file f and count how many times each frequency occurs in a  

;; String -> [Vectorof Nat]
;; imperative vector update
(define (cnt-vec f)
   (define a (make-vector (- HIGH LOW)))
   (define (up freq)
     (define i (- freq LOW))
     (vector-set! a i (+ (vector-ref a i) 1)))
   (with-input-from-file f
     (rec loop
       (lambda ()
         (define nxt (read))
         (unless (eof-object? nxt) (up nxt) (loop)))))

;; String -> [Listof [List Nat Nat]]
;; sort, then create association list
(define (cnt-alst f)
   (define l:in
     (with-input-from-file f
       (rec L
         (lambda ()
           (define nxt (read))
           (if (eof-object? nxt) '() (cons nxt (L)))))))
   (define l:st (sort l:in <))
   (let L ([l (cdr l:st)][p (car l:st)][c 1])
     (if (null? l) '()
         (let ([a (car l)])
           (if (= a p)
               (L (cdr l) p (+ c 1))
               (cons (list p c) (L (cdr l) (car l) 1)))))))

;; String (-> Association) (Association Nat -> Association) ->  
;; create association list on the fly
(define (cnt-fp f nu up)
   (with-input-from-file f
     (lambda ()
       (let L ([a (nu)])
         (define nxt (read))
         (if (eof-object? nxt) a (L (up a nxt)))))))

(define (cnt-AL f)
   (cnt-fp f
           (lambda () '())
           (lambda (al freq)
             (let L ((al al))
               (if (null? al)
                   (list (list freq 1))
                   (let* ([a (car al)]
                          [key (car a)])
                     (if (= key freq)
                         (cons (list key (+ (cadr a) 1)) (cdr al))
                         (cons a (L (cdr al))))))))))

(define (cnt-BST f)
   (define-struct node (lft info count rgt))
   ;; A BST is one of:
   ;; -- '()
   ;; -- (make-node BST Frequency Nat BST)
   (cnt-fp f
           (lambda () '())
           (lambda (a freq)
             (let L ([bst a])
               (if (null? bst)
                   (make-node '() freq 1 '())
                   (let* ([a (node-info bst)]
                          [lft (node-lft bst)]
                          [rgt (node-rgt bst)]
                          [cnt (node-count bst)])
                       [(< freq a) (make-node (L lft) a cnt rgt)]
                       [(= freq a) (make-node lft a (+ cnt 1) rgt)]
                       [else (make-node lft a cnt (L rgt))])))))))


(define (test f) (collect-garbage) (let ([r (time (f "tmp1"))]) (void)))

(test cnt-vec)
(test cnt-alst)
;; (test cnt-AL) ;;; takes too long for my taste
(test cnt-BST)

Posted on the users mailing list.