[plt-scheme] help on how to write a frequency-counting function in a more functional way

From: keydana at gmx.de (keydana at gmx.de)
Date: Mon Apr 20 16:10:01 EDT 2009

Hi Matthias, hi all,

please excuse the delay of my answer - it's an answer to this  
discussion's state from yesterday morning only - I did the coding  
yesterday, but didn't finish and couldn't do anything during working  
hours today ( and also I'm not so fast at writing scheme code - and,  
to be honest, understanding algorithms) - yet) ...
In fact the "cnt-AL" (functional update of alist) implementation was  
what I would have liked to write in the beginning but could not really  
imagine how to do, let alone the pretty  "cnt-alst" implementation.

Regarding everything that has been posted to this thread in the  
meantime, and also the general level of the posts, I don't think my  
"experiments" will be of much interest to you - but really I don't  
want to be impolite (just asking for information and not doing  
anything myself ...), so I'll just write what I tried yesterday  
regarding the following task:


>
> What I didn't time is writing out a sorted file at the end from the  
> 'association map' because I figured that I can do this in linear  
> time in all cases though I realize now that in the FP case, I get to  
> traverse the actual elements in the file in linear time while in the  
> Imperative case, I get to traverse ALL. Still, a vector traversal  
> should be fast enough.
>
> I will leave it to you to time the rest. Report to the list what you  
> find -- Matthias

First, in order to be able to compare the impact of the counting and  
sorting steps, I removed from the functions the input-getting part  
(also in my original task, I was operating on lists already at that  
point). [This way, they also got easier as I could replace the before- 
unknown-to-me recur by a simple named let :-). In fact I also assumed  
that factoring out the IO operation would make the differences between  
the algorithms even bigger, but it does not really look like this when  
I take my result for define SIZE 1000000:


Original operation on file:
cnt-vec: 		cpu time: 3058 real time: 3107 gc time: 0
cnt-alst: 		cpu time: 6408 real time: 6626 gc time: 1477
cnt-BST:		cpu time: 7927 real time: 7943 gc time: 64

Operation on list:
cnt-vec:		cpu time: 601 real time: 602 gc time: 0
cnt-alst:		cpu time: 1206 real time: 1233 gc time: 134
cnt-BST:		cpu time: 6419 real time: 6438 gc time: 76


Regarding the sorting, I did not understand the sentence "in the FP  
case, I get to traverse the actual elements in the file in linear time  
while in the Imperative case, I get to traverse ALL." - to me "all"  
would mean the same as linear, i.e. O(n)?
In fact for the sorting, I had just naively assumed I would use PLT's  
sort for the list, and though I did not look up its implementation I  
would assume it should be O (n log n) e.g. like a mergesort.
So I'd simply written

(define sort-alist
   (lambda (lst)
     (sort lst (lambda (x y) (> (cadr x) (cadr y))))))

For the vector, I don't know if this is stupid but I could not think  
of anything better than transforming the vector into an alist first,   
and then do the same sorting, like

(define sort-cnt-vec-from-list
   (lambda (lst)
     (sort-alist (vector->alist (cnt-vec-from-list lst)))))

(define vector->alist
   (lambda (vec)
     (do ((i 0 (+ i 1)) (res '() (if (= (vector-ref vec i) 0) res  
(cons (list (+ i LOW) (vector-ref vec i)) res))))
       ((= i (vector-length vec)) (reverse res))
       (void))))

So with this procedure, the sorting of the vector would also be in O  
(n log n), and so the whole sorting step should not make any difference.

In fact my results confirmed this:

sort-cnt-vec-from-list:	 cpu time: 657 real time: 658 gc time: 0
sort-cnt-alst-from-list:	 cpu time: 1091 real time: 1093 gc time: 0


So these were my naive results so far - seemed better to just write  
this than to not reply to this thread at all any more :-)
For the algorithms posted in the meantime, I will certainly take a  
look at them but I also need time for this ... thanks to all of you  
anyway for the helpful hints and explanations :-)

Ciao
Sigrid






Posted on the users mailing list.