[plt-scheme] Re: DrScheme Faster on Powerset than MzScheme - Why?

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Mon Mar 19 22:56:12 EDT 2007

At Fri, 16 Mar 2007 03:25:17 +0000 (UTC), Kyle Smith wrote:
> Thanks for looking at this.  First to Robby's question.  I ran it with and 
> without debugging and with and without it being compiled to .zo.  But, I didn't
> notice any significant change in times.   The numbers I posted were from my XP
> box, and they are consistent.  I just ran the same test on my Mac Pro box, and
> mzscheme was consistently ~200ms quicker than DrScheme (all times faster than 
> on the XP box.)  I never got the results that Jon got, where running (go) 
> several times made any difference.
> 
> So the issue is with my XP box.  That will teach me to stop using it and 
> always run my benchmarks on my more consistent, well behaved, and faster OS X 
> machine.  It's strange though, because both machines are dual 3.0GHz Xeon 
> boxes with 4GB of memory, so you'ld expect to see similar results.  But since 
> I can only make it happen on a single machine it has to be something with the 
> way the XP system is handling the heap I would suspect.
> 
> I appreciate all of your input.  I think we can safely say this is a machine 
> specific anomally.

No, it's not unusual.

When GC time is a significant part of a program's run time, then the
program can run faster in DrScheme than MzScheme. The reason is that
the program will collect garbage less often in DrScheme; less frequent
collection, in turn, is because DrScheme starts with a larger heap than
MzScheme. (A collection is triggered when current memory use reaches
threshold relative to total memory at the last collection.)

This effect is also usually smaller with 3m (the default in v369.x),
but not always. Then again, since 3m does a better job of cleaning up
garbage, it has less "memory" about whether you're going to use a lot
of memory, so the effect can happen more consistently (as in this
example when running `(go)' several times).

A competing effect, meanwhile, is that more live objects in memory mean
more GC work. This effect is much lower for 3m, due to its generational
collection.

Below are some timings on my machine (MacBook: x86, 2GHz), each example
using `(begin (go) (go) (go))'. The last run is in MzScheme, but with the
program prefixed with `(define dummy (make-vector 10000000))' to
inflate the heap. As you can see, Drscheme does happen to be faster on
my machine with 3m, though not really with CGC, and allocating the
dummy vector in MzScheme erases the difference in the 3m case.

Matthew

----------------------------------------

;; In MzScheme:
;;  3m
cpu time: 1282 real time: 1333 gc time: 872
cpu time: 1266 real time: 1275 gc time: 857
cpu time: 1264 real time: 1276 gc time: 859
;;  CGC
cpu time: 1806 real time: 1826 gc time: 1294
cpu time: 607 real time: 627 gc time: 188
cpu time: 399 real time: 402 gc time: 0

;; Drscheme, no debugging:
;;  3m
cpu time: 798 real time: 809 gc time: 408
cpu time: 727 real time: 738 gc time: 350
cpu time: 714 real time: 741 gc time: 340
;;  CGC
cpu time: 1614 real time: 1648 gc time: 1062
cpu time: 1738 real time: 1761 gc time: 1310
cpu time: 1665 real time: 1691 gc time: 1227

;; Drscheme (debugging):
;;  3m
cpu time: 935 real time: 977 gc time: 408
cpu time: 837 real time: 856 gc time: 349
cpu time: 828 real time: 857 gc time: 339
;;  CGC
cpu time: 1715 real time: 1761 gc time: 1041
cpu time: 1801 real time: 1838 gc time: 1255
cpu time: 1682 real time: 1704 gc time: 1142

;; MzScheme dummy 10M vector:
;;  3m
cpu time: 776 real time: 785 gc time: 320
cpu time: 668 real time: 675 gc time: 291
cpu time: 665 real time: 671 gc time: 289
;; CGC
cpu time: 1544 real time: 1645 gc time: 1018
cpu time: 432 real time: 452 gc time: 0
cpu time: 403 real time: 406 gc time: 0



Posted on the users mailing list.