[plt-scheme] help on how to write a frequency-counting function in a more functional way

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Mon Apr 20 20:30:46 EDT 2009

Consider writing a file generator (or list generator) that produces  
sample inputs close to your real use case. That may reveal something  
none of us has thought of -- Matthias

On Apr 20, 2009, at 4:10 PM, keydana at gmx.de wrote:

> Hi Matthias, hi all,
> please excuse the delay of my answer - it's an answer to this  
> discussion's state from yesterday morning only - I did the coding  
> yesterday, but didn't finish and couldn't do anything during  
> working hours today ( and also I'm not so fast at writing scheme  
> code - and, to be honest, understanding algorithms) - yet) ...
> In fact the "cnt-AL" (functional update of alist) implementation  
> was what I would have liked to write in the beginning but could not  
> really imagine how to do, let alone the pretty  "cnt-alst"  
> implementation.
> Regarding everything that has been posted to this thread in the  
> meantime, and also the general level of the posts, I don't think my  
> "experiments" will be of much interest to you - but really I don't  
> want to be impolite (just asking for information and not doing  
> anything myself ...), so I'll just write what I tried yesterday  
> regarding the following task:
>> What I didn't time is writing out a sorted file at the end from  
>> the 'association map' because I figured that I can do this in  
>> linear time in all cases though I realize now that in the FP case,  
>> I get to traverse the actual elements in the file in linear time  
>> while in the Imperative case, I get to traverse ALL. Still, a  
>> vector traversal should be fast enough.
>> I will leave it to you to time the rest. Report to the list what  
>> you find -- Matthias
> First, in order to be able to compare the impact of the counting  
> and sorting steps, I removed from the functions the input-getting  
> part (also in my original task, I was operating on lists already at  
> that point). [This way, they also got easier as I could replace the  
> before-unknown-to-me recur by a simple named let :-). In fact I  
> also assumed that factoring out the IO operation would make the  
> differences between the algorithms even bigger, but it does not  
> really look like this when I take my result for define SIZE 1000000:
> Original operation on file:
> cnt-vec: 		cpu time: 3058 real time: 3107 gc time: 0
> cnt-alst: 		cpu time: 6408 real time: 6626 gc time: 1477
> cnt-BST:		cpu time: 7927 real time: 7943 gc time: 64
> Operation on list:
> cnt-vec:		cpu time: 601 real time: 602 gc time: 0
> cnt-alst:		cpu time: 1206 real time: 1233 gc time: 134
> cnt-BST:		cpu time: 6419 real time: 6438 gc time: 76
> Regarding the sorting, I did not understand the sentence "in the FP  
> case, I get to traverse the actual elements in the file in linear  
> time while in the Imperative case, I get to traverse ALL." - to me  
> "all" would mean the same as linear, i.e. O(n)?
> In fact for the sorting, I had just naively assumed I would use  
> PLT's sort for the list, and though I did not look up its  
> implementation I would assume it should be O (n log n) e.g. like a  
> mergesort.
> So I'd simply written
> (define sort-alist
>   (lambda (lst)
>     (sort lst (lambda (x y) (> (cadr x) (cadr y))))))
> For the vector, I don't know if this is stupid but I could not  
> think of anything better than transforming the vector into an alist  
> first,  and then do the same sorting, like
> (define sort-cnt-vec-from-list
>   (lambda (lst)
>     (sort-alist (vector->alist (cnt-vec-from-list lst)))))
> (define vector->alist
>   (lambda (vec)
>     (do ((i 0 (+ i 1)) (res '() (if (= (vector-ref vec i) 0) res  
> (cons (list (+ i LOW) (vector-ref vec i)) res))))
>       ((= i (vector-length vec)) (reverse res))
>       (void))))
> So with this procedure, the sorting of the vector would also be in  
> O (n log n), and so the whole sorting step should not make any  
> difference.
> In fact my results confirmed this:
> sort-cnt-vec-from-list:	 cpu time: 657 real time: 658 gc time: 0
> sort-cnt-alst-from-list:	 cpu time: 1091 real time: 1093 gc time: 0
> So these were my naive results so far - seemed better to just write  
> this than to not reply to this thread at all any more :-)
> For the algorithms posted in the meantime, I will certainly take a  
> look at them but I also need time for this ... thanks to all of you  
> anyway for the helpful hints and explanations :-)
> Ciao
> Sigrid

Posted on the users mailing list.