[racket] Style and Performance question

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Mon May 9 22:34:42 EDT 2011

[Sam, Vincent: I have no clue how TR can beat R by two orders of magnitude. See end of message.]

I wrote a little function to test your two implementations of sqrt on lists of random numbers:

(define (time-and-compare LARGE)
  (define n (build-list LARGE (lambda (_) (random 10))))
  (collect-garbage) (collect-garbage) (collect-garbage)
  (define m (time (for-each approved-sqrt n)))
  (collect-garbage) (collect-garbage) (collect-garbage)
  (define l (time (for-each the-sqrt n)))
  (void))

As you can see below, approved-sqrt is consistently faster on 'short' lists (100 numbers) than my-sqrt; on 'long' lists (10,000 numbers) the opposite is true.  


On May 9, 2011, at 7:38 PM, Patrick King wrote:

> Question 1: Is my use of the temporary "error" worth it in terms of performance (the approved solution calculates (* x x) twice)? If it's not worth it, how big does the calculation have to become before it does?

It depends on where in the inner loop your calculations are and how many your program must perform. See below. 

> Can the compiler make such a temporary live only in a register?

In principle yes. I am not sure about our JIT. 


> Question 1a: How creeped are you by my changing the meaning of error for these few lines? I find myself doing stuff like this, in the name of my idea of readability, a lot.

Not one bit. I tend to agree with readability, though I catch myself thinking in the back of my head that this is also good for performance. (It shouldn't make  a difference.) 


> Question 2: The approved solution favored (/ ... (+ x x)) over my (/ ... 2 x). Cost of division versus readability. Wasn't a conscious decision of mine at the time, but now I'm curious. How smart is the compiler when presented with a division by 2? With multiple recalls of the same value?

The two versions differ in two regards: 

1. Your version flips the +/- calculations on the errors. (I eliminated this, the performance results stay the same.) 

2. You also replace an addition for duplication with a multiplication. A compiler would go the other way around (and I may do so too, and think how silly I am). 

I find myself unable to explain the differences in performance. Someone should probably look at the assembly and check what's going on. 


Here are five measurements for lists of 10,000 numbers: 

> > (time-and-compare 10000)
> cpu time: 1196 real time: 1223 gc time: 26
> cpu time: 1177 real time: 1229 gc time: 27
> > (time-and-compare 10000)
> cpu time: 1192 real time: 1242 gc time: 27
> cpu time: 1172 real time: 1250 gc time: 27
> > (time-and-compare 10000)
> cpu time: 1197 real time: 1308 gc time: 28
> cpu time: 1172 real time: 1215 gc time: 27
> > (time-and-compare 10000)
> cpu time: 1198 real time: 1230 gc time: 29
> cpu time: 1186 real time: 1286 gc time: 29
> > (time-and-compare 10000)
> cpu time: 1208 real time: 1289 gc time: 26
> cpu time: 1194 real time: 1303 gc time: 28

Now let's look at small values: 


> > (time-and-compare 100)
> cpu time: 27 real time: 28 gc time: 0
> cpu time: 33 real time: 40 gc time: 0
> > (time-and-compare 100)
> cpu time: 28 real time: 29 gc time: 0
> cpu time: 33 real time: 34 gc time: 0
> > (time-and-compare 100)
> cpu time: 27 real time: 30 gc time: 0
> cpu time: 32 real time: 38 gc time: 0
> > (time-and-compare 100)
> cpu time: 30 real time: 31 gc time: 0
> cpu time: 32 real time: 33 gc time: 0
> > (time-and-compare 100)
> cpu time: 26 real time: 27 gc time: 0
> cpu time: 34 real time: 33 gc time: 0


Now here is the scary part. Typed Racket is a TON faster and it was nearly no work to get types to work: 

> > (time-and-compare 10000)
> cpu time: 9 real time: 9 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
> #t
> > (time-and-compare 10000)
> cpu time: 24 real time: 34 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
> #t
> > (time-and-compare 10000)
> cpu time: 13 real time: 14 gc time: 0
> cpu time: 28 real time: 37 gc time: 0
> - : Boolean
> #t
> > (time-and-compare 10000)
> cpu time: 27 real time: 28 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean
> #t
> > (time-and-compare 10000)
> cpu time: 24 real time: 29 gc time: 0
> cpu time: 17 real time: 18 gc time: 0
> - : Boolean



;; -----------------------------------------------------------------------------
#lang typed/racket

(define epsilon 1e-7)

(: approved-sqrt : Natural -> Real)
(define (approved-sqrt y)
  (let loop [(x (/ y 2.0))]
    (if (< (abs (- (* x x) y)) epsilon)
        x
        (loop (- x (/ (- (* x x) y) (+ x x)))))))

(: my-sqrt : Natural -> Real)
(define (my-sqrt y)
  (let loop [(x (/ y 2.0))]
    (let [(error (- y (* x x)))]
      (if (< (abs error) epsilon) 
          x
          (loop (+ x (/ error 2 x)))))))

(: time-and-compare : Natural -> Boolean)
(define (time-and-compare LARGE)
  (define n (build-list LARGE (lambda (_) (random 10))))
  (collect-garbage) (collect-garbage) (collect-garbage)
  (define m (time (map approved-sqrt n)))
  (collect-garbage) (collect-garbage) (collect-garbage)
  (define l (time (map my-sqrt n)))
  (andmap = m l))

(time-and-compare 1000)



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20110509/00e9dced/attachment.html>

Posted on the users mailing list.