[plt-scheme] help on how to write a frequency-counting function in a more functional way
Boxes are fine for jobs where the work. It is just a single mutable
cell (altho I don't use them much -- set! or having multiple mutable
fields in a struct often seems more useful). FWIW, PR's suggestion is
much better than the association list revision I suggested if you want
to avoid mutation.
Robby
On Sat, Apr 18, 2009 at 4:03 PM, keydana at gmx.de <keydana at gmx.de> wrote:
> Hi Robby, hi Matthias,
>
> thanks for your ultra-quick responses! In fact in my special case, the data
> turned out to be not in a Gaussian or equally-distributed form at all -
> there were very few rather high outliers, but most frequencies were just
> equal to one (some suspicion in this direction was even the reason for the
> task). So a vector representing the frequencies would have had lots of empty
> fields.
>
> For the other suggestions, I will write the association-list variant for
> practise (and perhaps also for performance comparison, though this is not a
> problem in my specific task), and have a look at the binary tree algorithms
> for interest.
> Regarding the hash table solution, how does it really compare to the box
> version - is it better performance, better style...?
> I must admit I would have "automatically" used a hash table had I not
> discovered the box datatype by reading PLAI (before that, I didn't know
> something like this existed in PLT. So this is the occasion to ask - is this
> a feature meant to be used in "production code" or rather in a learning
> context only?
>
> Thanks a lot
> Sigrid
>
>
>
>
>
> Am 18.04.2009 um 22:37 schrieb Robby Findler:
>
>> Oh, yes -- sorry that's the best if you know that the data is dense in
>> a particular range of values (and that range isn't huge).
>>
>> Robby
>>
>> On Sat, Apr 18, 2009 at 3:34 PM, Matthias Felleisen
>> <matthias at ccs.neu.edu> wrote:
>>>
>>> Frequencies come in ranges and my guess is that you have the full range
>>> of
>>> them.
>>> If so, I would have no hesitations giving up pure FP and use a vector in
>>> the
>>> proper range. I doubt very much any FP technique comes close in
>>> efficiency.
>>> -- Matthias
>>>
>>>
>>> On Apr 18, 2009, at 4:18 PM, keydana at gmx.de wrote:
>>>
>>> Hi,
>>> today on one of the extremely rare occasions where I can use scheme at
>>> work,
>>> I had to get at the frequencies for a list of ids extracted from a
>>> logfile.
>>> E.g., I had a list like
>>> '(1234 1234 5678 1234)
>>> and wanted to, on the one hand, remove the duplicates and, on the other
>>> hand, store how many instances there were of each id.
>>> For example, if represented as an alist, the result might have looked
>>> like
>>> this:
>>> '((1234 3) (5678 1))
>>> However, I could not imagine a "purely functional" way to do this, and so
>>> I
>>> used a box to update the frequencies, like so:
>>> (define count
>>> (lambda (lst)
>>> (let count-accum ((lst lst) (frequencies '()))
>>> (if (null? lst)
>>> frequencies
>>> (let ((first (car lst)))
>>> (cond ((assoc first frequencies)
>>> (let ((count-box (cdr (assoc first frequencies))))
>>> (begin (set-box! count-box (+ 1 (unbox count-box)))
>>> (count-accum (cdr lst) frequencies))))
>>> (else (count-accum (cdr lst) (cons (cons first (box 1))
>>> frequencies)))))))))
>>>
>>> (I also considered a hashtable, which would have been equally "impure"
>>> and
>>> less practical, because the next processing step again required a list).
>>> I suppose there must be "more functional" (if I may say so) and more
>>> elegant
>>> ways to perform this task. I'd be glad to learn and thankful for any
>>> suggestions :-)
>>> Ciao
>>> Sigrid
>>>
>>> _________________________________________________
>>> For list-related administrative tasks:
>>> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>>
>>> _________________________________________________
>>> For list-related administrative tasks:
>>> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>>
>>>
>
>