[plt-scheme] help on how to write a frequency-counting function in a more functional way

From: keydana at gmx.de (keydana at gmx.de)
Date: Sat Apr 18 17:03:28 EDT 2009

Hi Robby, hi Matthias,

thanks for your ultra-quick responses! In fact in my special case, the  
data turned out to be not in a Gaussian or  equally-distributed form  
at all - there were very few rather high outliers, but most  
frequencies were just equal to one (some suspicion in this direction  
was even the reason for the task). So a vector representing the  
frequencies would have had lots of empty fields.

For the other suggestions, I will write the association-list variant  
for practise (and perhaps also for performance comparison, though this  
is not a problem in my specific task),  and have a look at the binary  
tree algorithms for interest.
Regarding the hash table solution, how does it really compare to the  
box version - is it better performance, better style...?
I must admit I would have "automatically" used a hash table had I not  
discovered the box datatype by reading PLAI (before that, I didn't  
know something like this existed in PLT. So this is the occasion to  
ask - is this a feature meant to be used in "production code" or  
rather in a learning context only?

Thanks a lot
Sigrid





Am 18.04.2009 um 22:37 schrieb Robby Findler:

> Oh, yes -- sorry that's the best if you know that the data is dense in
> a particular range of values (and that range isn't huge).
>
> Robby
>
> On Sat, Apr 18, 2009 at 3:34 PM, Matthias Felleisen
> <matthias at ccs.neu.edu> wrote:
>>
>> Frequencies come in ranges and my guess is that you have the full  
>> range of
>> them.
>> If so, I would have no hesitations giving up pure FP and use a  
>> vector in the
>> proper range. I doubt very much any FP technique comes close in  
>> efficiency.
>> -- Matthias
>>
>>
>> On Apr 18, 2009, at 4:18 PM, keydana at gmx.de wrote:
>>
>> Hi,
>> today on one of the extremely rare occasions where I can use scheme  
>> at work,
>> I had to get at the frequencies for a list of ids extracted from a  
>> logfile.
>> E.g., I had a list like
>> '(1234 1234 5678 1234)
>> and wanted to, on the one hand, remove the duplicates and, on the  
>> other
>> hand, store how many instances there were of each id.
>> For example, if represented as an alist, the result might have  
>> looked like
>> this:
>> '((1234 3) (5678 1))
>> However, I could not imagine a "purely functional" way to do this,  
>> and so I
>> used a box to update the frequencies, like so:
>> (define count
>>  (lambda (lst)
>>    (let count-accum ((lst lst) (frequencies '()))
>>      (if (null? lst)
>>          frequencies
>>          (let ((first (car lst)))
>>            (cond ((assoc first frequencies)
>>                   (let ((count-box (cdr (assoc first frequencies))))
>>                     (begin (set-box! count-box (+ 1 (unbox count- 
>> box)))
>>                            (count-accum (cdr lst) frequencies))))
>>                  (else (count-accum (cdr lst) (cons (cons first  
>> (box 1))
>> frequencies)))))))))
>>
>> (I also considered a hashtable, which would have been equally  
>> "impure" and
>> less practical, because the next processing step again required a  
>> list).
>> I suppose there must be "more functional" (if I may say so) and  
>> more elegant
>> ways to perform this task. I'd be glad to learn and thankful for any
>> suggestions :-)
>> Ciao
>> Sigrid
>>
>> _________________________________________________
>>   For list-related administrative tasks:
>>   http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>
>> _________________________________________________
>>  For list-related administrative tasks:
>>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>
>>



Posted on the users mailing list.