[racket] Case study: using the profiler to trace a performance issue in compiler/zo-parse
I've had similarly happy experiences with the profiler when working on
Redex's pattern matcher. It has been great. (+1 on the sorting thing.)
Thanks, Eli.
Robby
On Sat, Mar 17, 2012 at 7:26 PM, Danny Yoo <dyoo at cs.wpi.edu> wrote:
> Here's a small running "internal dialogue" to demonstrate how the use
> of a profiler can help detect unexpected performance problems.
>
> ---
>
> I was trying to trace a compilation performance issue in Whalesong.
> One of the things I've added to help me chase this is an internal
> debugging flag "--enable-profiling", that automatically runs Racket's
> statistical profiler as it's doing its work.
>
>
> So I run Whalesong on one of my programs and watch what comes out of
> the profiler. From earlier discussion, I run Racket without the JIT
> so that the profiler presents more useful information.
>
> ######
> $ racket --no-jit ../../whalesong.rkt build --enable-profiling nucleic2.rkt
> ######
>
> ... lots and lots of profiler output starts printing to screen.
>
> (It would be _very_ useful if the text renderer of the profiler were
> sorted in terms of self time, by the way! Can this be changed?)
>
>
> The profiler generates a sequence of "node" outputs, where each node
> represents a function call and its context. I start scanning down the
> "self" time column, which is how much time is being spent doing work
> in a particular function.
>
> One of the functions in the profile jumps out at me:
>
> #######################################################################
> Profiling results
> -----------------
> Total cpu time observed: 9144ms (out of 9268ms)
> Number of samples taken: 3595 (once every 3ms)
> ...
> [106] 1822(20.0%) 1822(20.0%) for-loop
> /home/dyoo/local/racket-5.2.1/lib/racket/collects/compiler/zo-parse.rkt:490:2
> #######################################################################
>
>
> Oh! About twenty percent of the time is being spent here, in parsing
> zo files! Whoa. Why? Let me take a look at that snippet of code.
> Here's is the surrounding context:
>
> ;; in zo-parse.rkt ;;;;;;;;;;;;;;;;;
> (define cpt-table
> ;; The "schcpt.h" mapping
> `([0 escape]
> [1 symbol]
> ;; ...
> ;; dyoo: omitting some output
> ;; ...
> [247 255 small-application]))
>
> (define (cpt-table-lookup i)
> (for/or ([ent (in-list cpt-table)])
> (match ent
> [(list k sym)
> (and (= k i) (cons k sym))]
> [(list k k* sym)
> (and (<= k i)
> (< i k*)
> (cons k sym))])))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> cpt-table-lookup defines the for-loop that the profiler reports as
> being a hot-spot. Oh, wait! The function looks odd. What
> cpt-table-lookup is doing is a lot of work, and I expected it to be a
> simple, constant table lookup. (In a sense, it is "constant", but the
> constant is larger than it should be.)
>
>
> Since cpt-table is itself a constant value, that leaves
> cpt-table-lookup ripe for an optimization: I can precompute a flat
> lookup table. I'll fix cpt-table-lookup by creating a vector,
> up-front, that flattens out the cpt-table. Here's what that looks
> like:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (define cpt-table-as-vector (make-vector 256))
> (for ([ent (in-list cpt-table)])
> (match ent
> [(list k sym)
> (vector-set! cpt-table-as-vector k (cons k sym))]
> [(list k k* sym)
> (for ([i (in-range k k*)])
> (vector-set! cpt-table-as-vector i (cons k sym)))]))
>
> (define (cpt-table-lookup i)
> (vector-ref cpt-table-as-vector i))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>
> Ok, let's see what the profiler tells me now:
>
> ############################################
>
> Profiling results
> -----------------
> Total cpu time observed: 7052ms (out of 7148ms)
> Number of samples taken: 2812 (once every 3ms)
> ############################################
>
> Sweet. That helped cut it down from 9 seconds to 7. The node
> associated to the cpt-table-lookup function doesn't show up in the
> profile anymore, either. About a 20% performance improvement. Cool!
>
>
> The effect of this optimization isn't so pronounced when Racket is
> running under its JIT. Under the JIT, my program took 4.3 seconds
> before the optimization, and 3.9 seconds after the optimization,
> giving me a 9% improvement. Still, not too bad.
>
>
> I'll send a patch to compiler/zo-parse later. But I wanted to share
> the importance of the profiler: I would not have easily pinpointed
> zo-parse without it.
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users