[racket] Style and Performance question

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Mon May 9 23:19:23 EDT 2011

Even without typed racket, this:

(define (my-sqrt y)
  (let loop [(x (/ y 2.0))]
    (let [(error (- y (* x x)))]
      (if (< (abs error) epsilon)
          x
          (loop (+ x (/ error 2 x)))))))

is much faster than this:

(define (my-sqrt y)
  (let loop [(x (/ y 2))]
    (let [(error (- y (* x x)))]
      (if (< (abs error) epsilon)
          x
          (loop (+ x (/ error 2 x)))))))

Robby

On Mon, May 9, 2011 at 9:34 PM, Matthias Felleisen <matthias at ccs.neu.edu> wrote:
>
> [Sam, Vincent: I have no clue how TR can beat R by two orders of magnitude.
> See end of message.]
> I wrote a little function to test your two implementations of sqrt on lists
> of random numbers:
> (define (time-and-compare LARGE)
>   (define n (build-list LARGE (lambda (_) (random 10))))
>   (collect-garbage) (collect-garbage) (collect-garbage)
>   (define m (time (for-each approved-sqrt n)))
>   (collect-garbage) (collect-garbage) (collect-garbage)
>   (define l (time (for-each the-sqrt n)))
>   (void))
> As you can see below, approved-sqrt is consistently faster on 'short' lists
> (100 numbers) than my-sqrt; on 'long' lists (10,000 numbers) the opposite is
> true.
>
> On May 9, 2011, at 7:38 PM, Patrick King wrote:
>
> Question 1: Is my use of the temporary "error" worth it in terms of
> performance (the approved solution calculates (* x x) twice)? If it's not
> worth it, how big does the calculation have to become before it does?
>
> It depends on where in the inner loop your calculations are and how many
> your program must perform. See below.
>
> Can the compiler make such a temporary live only in a register?
>
> In principle yes. I am not sure about our JIT.
>
> Question 1a: How creeped are you by my changing the meaning of error for
> these few lines? I find myself doing stuff like this, in the name of my idea
> of readability, a lot.
>
> Not one bit. I tend to agree with readability, though I catch myself
> thinking in the back of my head that this is also good for performance. (It
> shouldn't make  a difference.)
>
> Question 2: The approved solution favored (/ ... (+ x x)) over my (/ ... 2
> x). Cost of division versus readability. Wasn't a conscious decision of mine
> at the time, but now I'm curious. How smart is the compiler when presented
> with a division by 2? With multiple recalls of the same value?
>
> The two versions differ in two regards:
> 1. Your version flips the +/- calculations on the errors. (I eliminated
> this, the performance results stay the same.)
> 2. You also replace an addition for duplication with a multiplication. A
> compiler would go the other way around (and I may do so too, and think how
> silly I am).
> I find myself unable to explain the differences in performance. Someone
> should probably look at the assembly and check what's going on.
>
> Here are five measurements for lists of 10,000 numbers:
>
>> (time-and-compare 10000)
>
> cpu time: 1196 real time: 1223 gc time: 26
>
> cpu time: 1177 real time: 1229 gc time: 27
>
>> (time-and-compare 10000)
>
> cpu time: 1192 real time: 1242 gc time: 27
>
> cpu time: 1172 real time: 1250 gc time: 27
>
>> (time-and-compare 10000)
>
> cpu time: 1197 real time: 1308 gc time: 28
>
> cpu time: 1172 real time: 1215 gc time: 27
>
>> (time-and-compare 10000)
>
> cpu time: 1198 real time: 1230 gc time: 29
>
> cpu time: 1186 real time: 1286 gc time: 29
>
>> (time-and-compare 10000)
>
> cpu time: 1208 real time: 1289 gc time: 26
>
> cpu time: 1194 real time: 1303 gc time: 28
>
> Now let's look at small values:
>
>> (time-and-compare 100)
>
> cpu time: 27 real time: 28 gc time: 0
>
> cpu time: 33 real time: 40 gc time: 0
>
>> (time-and-compare 100)
>
> cpu time: 28 real time: 29 gc time: 0
>
> cpu time: 33 real time: 34 gc time: 0
>
>> (time-and-compare 100)
>
> cpu time: 27 real time: 30 gc time: 0
>
> cpu time: 32 real time: 38 gc time: 0
>
>> (time-and-compare 100)
>
> cpu time: 30 real time: 31 gc time: 0
>
> cpu time: 32 real time: 33 gc time: 0
>
>> (time-and-compare 100)
>
> cpu time: 26 real time: 27 gc time: 0
>
> cpu time: 34 real time: 33 gc time: 0
>
> Now here is the scary part. Typed Racket is a TON faster and it was nearly
> no work to get types to work:
>
>> (time-and-compare 10000)
> cpu time: 9 real time: 9 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
> #t
>> (time-and-compare 10000)
> cpu time: 24 real time: 34 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
> #t
>> (time-and-compare 10000)
> cpu time: 13 real time: 14 gc time: 0
> cpu time: 28 real time: 37 gc time: 0
> - : Boolean
> #t
>> (time-and-compare 10000)
> cpu time: 27 real time: 28 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
> #t
>> (time-and-compare 10000)
> cpu time: 24 real time: 29 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
>
>
> ;;
> -----------------------------------------------------------------------------
> #lang typed/racket
> (define epsilon 1e-7)
> (: approved-sqrt : Natural -> Real)
> (define (approved-sqrt y)
>   (let loop [(x (/ y 2.0))]
>     (if (< (abs (- (* x x) y)) epsilon)
>         x
>         (loop (- x (/ (- (* x x) y) (+ x x)))))))
> (: my-sqrt : Natural -> Real)
> (define (my-sqrt y)
>   (let loop [(x (/ y 2.0))]
>     (let [(error (- y (* x x)))]
>       (if (< (abs error) epsilon)
>           x
>           (loop (+ x (/ error 2 x)))))))
> (: time-and-compare : Natural -> Boolean)
> (define (time-and-compare LARGE)
>   (define n (build-list LARGE (lambda (_) (random 10))))
>   (collect-garbage) (collect-garbage) (collect-garbage)
>   (define m (time (map approved-sqrt n)))
>   (collect-garbage) (collect-garbage) (collect-garbage)
>   (define l (time (map my-sqrt n)))
>   (andmap = m l))
> (time-and-compare 1000)
>
>
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://lists.racket-lang.org/listinfo/users
>



Posted on the users mailing list.