[plt-scheme] help on how to write a frequency-counting function in a more functional way
Hi Matthias, hi all,
please excuse the delay of my answer - it's an answer to this
discussion's state from yesterday morning only - I did the coding
yesterday, but didn't finish and couldn't do anything during working
hours today ( and also I'm not so fast at writing scheme code - and,
to be honest, understanding algorithms) - yet) ...
In fact the "cnt-AL" (functional update of alist) implementation was
what I would have liked to write in the beginning but could not really
imagine how to do, let alone the pretty "cnt-alst" implementation.
Regarding everything that has been posted to this thread in the
meantime, and also the general level of the posts, I don't think my
"experiments" will be of much interest to you - but really I don't
want to be impolite (just asking for information and not doing
anything myself ...), so I'll just write what I tried yesterday
regarding the following task:
>
> What I didn't time is writing out a sorted file at the end from the
> 'association map' because I figured that I can do this in linear
> time in all cases though I realize now that in the FP case, I get to
> traverse the actual elements in the file in linear time while in the
> Imperative case, I get to traverse ALL. Still, a vector traversal
> should be fast enough.
>
> I will leave it to you to time the rest. Report to the list what you
> find -- Matthias
First, in order to be able to compare the impact of the counting and
sorting steps, I removed from the functions the input-getting part
(also in my original task, I was operating on lists already at that
point). [This way, they also got easier as I could replace the before-
unknown-to-me recur by a simple named let :-). In fact I also assumed
that factoring out the IO operation would make the differences between
the algorithms even bigger, but it does not really look like this when
I take my result for define SIZE 1000000:
Original operation on file:
cnt-vec: cpu time: 3058 real time: 3107 gc time: 0
cnt-alst: cpu time: 6408 real time: 6626 gc time: 1477
cnt-BST: cpu time: 7927 real time: 7943 gc time: 64
Operation on list:
cnt-vec: cpu time: 601 real time: 602 gc time: 0
cnt-alst: cpu time: 1206 real time: 1233 gc time: 134
cnt-BST: cpu time: 6419 real time: 6438 gc time: 76
Regarding the sorting, I did not understand the sentence "in the FP
case, I get to traverse the actual elements in the file in linear time
while in the Imperative case, I get to traverse ALL." - to me "all"
would mean the same as linear, i.e. O(n)?
In fact for the sorting, I had just naively assumed I would use PLT's
sort for the list, and though I did not look up its implementation I
would assume it should be O (n log n) e.g. like a mergesort.
So I'd simply written
(define sort-alist
(lambda (lst)
(sort lst (lambda (x y) (> (cadr x) (cadr y))))))
For the vector, I don't know if this is stupid but I could not think
of anything better than transforming the vector into an alist first,
and then do the same sorting, like
(define sort-cnt-vec-from-list
(lambda (lst)
(sort-alist (vector->alist (cnt-vec-from-list lst)))))
(define vector->alist
(lambda (vec)
(do ((i 0 (+ i 1)) (res '() (if (= (vector-ref vec i) 0) res
(cons (list (+ i LOW) (vector-ref vec i)) res))))
((= i (vector-length vec)) (reverse res))
(void))))
So with this procedure, the sorting of the vector would also be in O
(n log n), and so the whole sorting step should not make any difference.
In fact my results confirmed this:
sort-cnt-vec-from-list: cpu time: 657 real time: 658 gc time: 0
sort-cnt-alst-from-list: cpu time: 1091 real time: 1093 gc time: 0
So these were my naive results so far - seemed better to just write
this than to not reply to this thread at all any more :-)
For the algorithms posted in the meantime, I will certainly take a
look at them but I also need time for this ... thanks to all of you
anyway for the helpful hints and explanations :-)
Ciao
Sigrid