[plt-scheme] help on how to write a frequency-counting function in a more functional way
Hi Robby, hi Matthias,
thanks for your ultra-quick responses! In fact in my special case, the
data turned out to be not in a Gaussian or equally-distributed form
at all - there were very few rather high outliers, but most
frequencies were just equal to one (some suspicion in this direction
was even the reason for the task). So a vector representing the
frequencies would have had lots of empty fields.
For the other suggestions, I will write the association-list variant
for practise (and perhaps also for performance comparison, though this
is not a problem in my specific task), and have a look at the binary
tree algorithms for interest.
Regarding the hash table solution, how does it really compare to the
box version - is it better performance, better style...?
I must admit I would have "automatically" used a hash table had I not
discovered the box datatype by reading PLAI (before that, I didn't
know something like this existed in PLT. So this is the occasion to
ask - is this a feature meant to be used in "production code" or
rather in a learning context only?
Thanks a lot
Sigrid
Am 18.04.2009 um 22:37 schrieb Robby Findler:
> Oh, yes -- sorry that's the best if you know that the data is dense in
> a particular range of values (and that range isn't huge).
>
> Robby
>
> On Sat, Apr 18, 2009 at 3:34 PM, Matthias Felleisen
> <matthias at ccs.neu.edu> wrote:
>>
>> Frequencies come in ranges and my guess is that you have the full
>> range of
>> them.
>> If so, I would have no hesitations giving up pure FP and use a
>> vector in the
>> proper range. I doubt very much any FP technique comes close in
>> efficiency.
>> -- Matthias
>>
>>
>> On Apr 18, 2009, at 4:18 PM, keydana at gmx.de wrote:
>>
>> Hi,
>> today on one of the extremely rare occasions where I can use scheme
>> at work,
>> I had to get at the frequencies for a list of ids extracted from a
>> logfile.
>> E.g., I had a list like
>> '(1234 1234 5678 1234)
>> and wanted to, on the one hand, remove the duplicates and, on the
>> other
>> hand, store how many instances there were of each id.
>> For example, if represented as an alist, the result might have
>> looked like
>> this:
>> '((1234 3) (5678 1))
>> However, I could not imagine a "purely functional" way to do this,
>> and so I
>> used a box to update the frequencies, like so:
>> (define count
>> (lambda (lst)
>> (let count-accum ((lst lst) (frequencies '()))
>> (if (null? lst)
>> frequencies
>> (let ((first (car lst)))
>> (cond ((assoc first frequencies)
>> (let ((count-box (cdr (assoc first frequencies))))
>> (begin (set-box! count-box (+ 1 (unbox count-
>> box)))
>> (count-accum (cdr lst) frequencies))))
>> (else (count-accum (cdr lst) (cons (cons first
>> (box 1))
>> frequencies)))))))))
>>
>> (I also considered a hashtable, which would have been equally
>> "impure" and
>> less practical, because the next processing step again required a
>> list).
>> I suppose there must be "more functional" (if I may say so) and
>> more elegant
>> ways to perform this task. I'd be glad to learn and thankful for any
>> suggestions :-)
>> Ciao
>> Sigrid
>>
>> _________________________________________________
>> For list-related administrative tasks:
>> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>
>> _________________________________________________
>> For list-related administrative tasks:
>> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>
>>