[racket] ranking items by their appearance in sets
On Sep 24, 2014, at 12:29 AM, David T. Pierson <dtp at mindstory.com> wrote:
> I'm looking for criticism in any aspect,
Style: Here is my rewrite with style suggestions. I liked to have a single point that tests all versions of a function when I conduct such design experiments. I also want to have all my test code in the test submodule so that it doesn't get deployed. That's why I am using a syntactic abstraction that unfolds into a submodule rather than a function that calls things.
Performance: I have not checked their performance but I cannot imagine the first version to be performant. See my annotations.
My choice: Your second version is what I would have written.
#lang racket
(module+ test
(require rackunit))
;; ---------------------------------------------------------------------------------------------------
;; [Listof [Setof Symbol]] -> [Listof [Cons Symbol Natural]]
;; rank the elements of the given sets based on how many sets they appear in
;; version 1
(define (rank-set-elements/1 list-of-sets) ;; n is number of symbols in sets
(define list-of-lists (map set->list list-of-sets)) ;; O(n)
(define flat-list (flatten list-of-lists)) ;; allocates a lot
;; I would have fused the next two lines and possibly all four
(define sorted-flat-list (sort flat-list symbol<?)) ;; O(n^2)?
(define list-of-item-frequency-pairs ;; critically relies on (sorted v)
(for/fold ((accum '())) ((v sorted-flat-list))
(match accum
;; this matches until the next, distinct symbol shows up
[(list-rest (cons sym cnt) rst)
(if (eq? sym v)
(cons (cons sym (add1 cnt)) rst)
(cons (cons v 1) accum))]
[else (list (cons v 1))])))
(sort list-of-item-frequency-pairs > #:key cdr))
;; version 2 [[ that's the version I would have written ]]
(define (rank-set-elements/2 list-of-sets)
(define hash-of-item-frequencies
(for*/fold ([frequencies (hash)]) ([a-set list-of-sets] [sym a-set])
(hash-update frequencies sym add1 0)))
;; I would fuse the next two lines
(define list-of-item-frequency-pairs (hash->list hash-of-item-frequencies))
(sort list-of-item-frequency-pairs > #:key cdr))
;; unit tests <
(define-syntax-rule
(test rank)
;; =>
(module+ test
(define input
(shuffle
(map list->set
'((a)
(a b)
(a b c)
(a b c d)
(a b c d e)))))
(define expected
'((a . 5)
(b . 4)
(c . 3)
(d . 2)
(e . 1)))
(check-equal? (rank '()) '())
(check-equal? (rank input) expected)))
(test rank-set-elements/1)
(test rank-set-elements/2)