[plt-scheme] help on how to write a frequency-counting function in a more functional way
Consider writing a file generator (or list generator) that produces
sample inputs close to your real use case. That may reveal something
none of us has thought of -- Matthias
On Apr 20, 2009, at 4:10 PM, keydana at gmx.de wrote:
> Hi Matthias, hi all,
>
> please excuse the delay of my answer - it's an answer to this
> discussion's state from yesterday morning only - I did the coding
> yesterday, but didn't finish and couldn't do anything during
> working hours today ( and also I'm not so fast at writing scheme
> code - and, to be honest, understanding algorithms) - yet) ...
> In fact the "cnt-AL" (functional update of alist) implementation
> was what I would have liked to write in the beginning but could not
> really imagine how to do, let alone the pretty "cnt-alst"
> implementation.
>
> Regarding everything that has been posted to this thread in the
> meantime, and also the general level of the posts, I don't think my
> "experiments" will be of much interest to you - but really I don't
> want to be impolite (just asking for information and not doing
> anything myself ...), so I'll just write what I tried yesterday
> regarding the following task:
>
>
>>
>> What I didn't time is writing out a sorted file at the end from
>> the 'association map' because I figured that I can do this in
>> linear time in all cases though I realize now that in the FP case,
>> I get to traverse the actual elements in the file in linear time
>> while in the Imperative case, I get to traverse ALL. Still, a
>> vector traversal should be fast enough.
>>
>> I will leave it to you to time the rest. Report to the list what
>> you find -- Matthias
>
> First, in order to be able to compare the impact of the counting
> and sorting steps, I removed from the functions the input-getting
> part (also in my original task, I was operating on lists already at
> that point). [This way, they also got easier as I could replace the
> before-unknown-to-me recur by a simple named let :-). In fact I
> also assumed that factoring out the IO operation would make the
> differences between the algorithms even bigger, but it does not
> really look like this when I take my result for define SIZE 1000000:
>
>
> Original operation on file:
> cnt-vec: cpu time: 3058 real time: 3107 gc time: 0
> cnt-alst: cpu time: 6408 real time: 6626 gc time: 1477
> cnt-BST: cpu time: 7927 real time: 7943 gc time: 64
>
> Operation on list:
> cnt-vec: cpu time: 601 real time: 602 gc time: 0
> cnt-alst: cpu time: 1206 real time: 1233 gc time: 134
> cnt-BST: cpu time: 6419 real time: 6438 gc time: 76
>
>
> Regarding the sorting, I did not understand the sentence "in the FP
> case, I get to traverse the actual elements in the file in linear
> time while in the Imperative case, I get to traverse ALL." - to me
> "all" would mean the same as linear, i.e. O(n)?
> In fact for the sorting, I had just naively assumed I would use
> PLT's sort for the list, and though I did not look up its
> implementation I would assume it should be O (n log n) e.g. like a
> mergesort.
> So I'd simply written
>
> (define sort-alist
> (lambda (lst)
> (sort lst (lambda (x y) (> (cadr x) (cadr y))))))
>
> For the vector, I don't know if this is stupid but I could not
> think of anything better than transforming the vector into an alist
> first, and then do the same sorting, like
>
> (define sort-cnt-vec-from-list
> (lambda (lst)
> (sort-alist (vector->alist (cnt-vec-from-list lst)))))
>
> (define vector->alist
> (lambda (vec)
> (do ((i 0 (+ i 1)) (res '() (if (= (vector-ref vec i) 0) res
> (cons (list (+ i LOW) (vector-ref vec i)) res))))
> ((= i (vector-length vec)) (reverse res))
> (void))))
>
> So with this procedure, the sorting of the vector would also be in
> O (n log n), and so the whole sorting step should not make any
> difference.
>
> In fact my results confirmed this:
>
> sort-cnt-vec-from-list: cpu time: 657 real time: 658 gc time: 0
> sort-cnt-alst-from-list: cpu time: 1091 real time: 1093 gc time: 0
>
>
> So these were my naive results so far - seemed better to just write
> this than to not reply to this thread at all any more :-)
> For the algorithms posted in the meantime, I will certainly take a
> look at them but I also need time for this ... thanks to all of you
> anyway for the helpful hints and explanations :-)
>
> Ciao
> Sigrid
>
>
>
>