[plt-scheme] help on how to write a frequency-counting function in a more functional way

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Sat Apr 18 16:27:01 EDT 2009

If the list is very long, an equal-key hash table is definitely the
way to go (unless you expect there to be very few duplicates). If it
is very long (with lots of duplicates) and you really want to do
something purely functional, there are some clever purely function
balanced binary tree algorithms out there (I'd start with Chris
Okasaki's book to learn more about them). If the list is very long and
there are not many duplicates, then I'm not sure of the best
algorithm, but I'd probably start with the hash-table and experiment
from there.

If the list is short, then using an assocation table as the
accumulator is probably just fine and is purely functional. You'd do
something similar to what you have below, except instead of mutating
the box, you can remove the old association from the list and replace
with a new one that has the updated value. Probably easiest to use
'filter' to remove the old association.

hth,
Robby

On Sat, Apr 18, 2009 at 3:18 PM, keydana at gmx.de <keydana at gmx.de> wrote:
> Hi,
> today on one of the extremely rare occasions where I can use scheme at work,
> I had to get at the frequencies for a list of ids extracted from a logfile.
> E.g., I had a list like
> '(1234 1234 5678 1234)
> and wanted to, on the one hand, remove the duplicates and, on the other
> hand, store how many instances there were of each id.
> For example, if represented as an alist, the result might have looked like
> this:
> '((1234 3) (5678 1))
> However, I could not imagine a "purely functional" way to do this, and so I
> used a box to update the frequencies, like so:
> (define count
>  (lambda (lst)
>    (let count-accum ((lst lst) (frequencies '()))
>      (if (null? lst)
>          frequencies
>          (let ((first (car lst)))
>            (cond ((assoc first frequencies)
>                   (let ((count-box (cdr (assoc first frequencies))))
>                     (begin (set-box! count-box (+ 1 (unbox count-box)))
>                            (count-accum (cdr lst) frequencies))))
>                  (else (count-accum (cdr lst) (cons (cons first (box 1))
> frequencies)))))))))
>
> (I also considered a hashtable, which would have been equally "impure" and
> less practical, because the next processing step again required a list).
> I suppose there must be "more functional" (if I may say so) and more elegant
> ways to perform this task. I'd be glad to learn and thankful for any
> suggestions :-)
> Ciao
> Sigrid
>
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>
>


Posted on the users mailing list.