[plt-scheme] help on how to write a frequency-counting function in a more functional way

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Sat Apr 18 17:05:49 EDT 2009

Boxes are fine for jobs where the work. It is just a single mutable
cell (altho I don't use them much -- set! or having multiple mutable
fields in a struct often seems more useful). FWIW, PR's suggestion is
much better than the association list revision I suggested if you want
to avoid mutation.

Robby

On Sat, Apr 18, 2009 at 4:03 PM, keydana at gmx.de <keydana at gmx.de> wrote:
> Hi Robby, hi Matthias,
>
> thanks for your ultra-quick responses! In fact in my special case, the data
> turned out to be not in a Gaussian or  equally-distributed form at all -
> there were very few rather high outliers, but most frequencies were just
> equal to one (some suspicion in this direction was even the reason for the
> task). So a vector representing the frequencies would have had lots of empty
> fields.
>
> For the other suggestions, I will write the association-list variant for
> practise (and perhaps also for performance comparison, though this is not a
> problem in my specific task),  and have a look at the binary tree algorithms
> for interest.
> Regarding the hash table solution, how does it really compare to the box
> version - is it better performance, better style...?
> I must admit I would have "automatically" used a hash table had I not
> discovered the box datatype by reading PLAI (before that, I didn't know
> something like this existed in PLT. So this is the occasion to ask - is this
> a feature meant to be used in "production code" or rather in a learning
> context only?
>
> Thanks a lot
> Sigrid
>
>
>
>
>
> Am 18.04.2009 um 22:37 schrieb Robby Findler:
>
>> Oh, yes -- sorry that's the best if you know that the data is dense in
>> a particular range of values (and that range isn't huge).
>>
>> Robby
>>
>> On Sat, Apr 18, 2009 at 3:34 PM, Matthias Felleisen
>> <matthias at ccs.neu.edu> wrote:
>>>
>>> Frequencies come in ranges and my guess is that you have the full range
>>> of
>>> them.
>>> If so, I would have no hesitations giving up pure FP and use a vector in
>>> the
>>> proper range. I doubt very much any FP technique comes close in
>>> efficiency.
>>> -- Matthias
>>>
>>>
>>> On Apr 18, 2009, at 4:18 PM, keydana at gmx.de wrote:
>>>
>>> Hi,
>>> today on one of the extremely rare occasions where I can use scheme at
>>> work,
>>> I had to get at the frequencies for a list of ids extracted from a
>>> logfile.
>>> E.g., I had a list like
>>> '(1234 1234 5678 1234)
>>> and wanted to, on the one hand, remove the duplicates and, on the other
>>> hand, store how many instances there were of each id.
>>> For example, if represented as an alist, the result might have looked
>>> like
>>> this:
>>> '((1234 3) (5678 1))
>>> However, I could not imagine a "purely functional" way to do this, and so
>>> I
>>> used a box to update the frequencies, like so:
>>> (define count
>>>  (lambda (lst)
>>>   (let count-accum ((lst lst) (frequencies '()))
>>>     (if (null? lst)
>>>         frequencies
>>>         (let ((first (car lst)))
>>>           (cond ((assoc first frequencies)
>>>                  (let ((count-box (cdr (assoc first frequencies))))
>>>                    (begin (set-box! count-box (+ 1 (unbox count-box)))
>>>                           (count-accum (cdr lst) frequencies))))
>>>                 (else (count-accum (cdr lst) (cons (cons first (box 1))
>>> frequencies)))))))))
>>>
>>> (I also considered a hashtable, which would have been equally "impure"
>>> and
>>> less practical, because the next processing step again required a list).
>>> I suppose there must be "more functional" (if I may say so) and more
>>> elegant
>>> ways to perform this task. I'd be glad to learn and thankful for any
>>> suggestions :-)
>>> Ciao
>>> Sigrid
>>>
>>> _________________________________________________
>>>  For list-related administrative tasks:
>>>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>>
>>> _________________________________________________
>>>  For list-related administrative tasks:
>>>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>>
>>>
>
>


Posted on the users mailing list.