Fwd: [plt-scheme] In Search of Equitable Temporal Assignment of GC time in Benchmarking

From: Kyle Smith (airfoil at bellsouth.net)
Date: Fri Jun 29 11:30:19 EDT 2007

William D Clinger wrote:
> Kyle Smith wrote:
>   
>> With the adoption of generational GC's, the non-trivial problem of  
>> distributing GC time in benchmarking results is less of a factor.   
>> However, there remain Scheme implementations that have not yet  
>> implemented modern garbage collection algorithms, and that present  
>> a non-trivial problem for benchmarking comparative function  
>> execution time.
>>     
>
> Benchmarking of garbage collectors is very difficult
> because there are so many things that should be held
> constant, but the collectors you want to benchmark
> are so different that they cannot be controlled in
> ways that would maintain those constants.
>
>   
>> The problem becomes quite significant when as the  
>> percentage of reachable data in the heap increases, while the  
>> available memory decreases.
>>     
>
> We call this the load factor (reachable data divided
> by total memory in use) or inverse load factor (total
> memory in use divided by reachable data).  Just about
> any collector will be reasonably efficient (in an
> amortized sense) at low load factors, so you're mainly
> interested in the high load factors where benchmarking
> is most difficult.
>
>   
>> I make these statements based on my own experience with attempts at  
>> benchmarking two different implementations that suffer this  
>> problem.  In order to produce credible results, I need to account  
>> for the GC time in an equitable fashion, that would ideally  
>> distribute the GC in a temporal manner over real time of  
>> execution.  After an ACM search, I didn't find anything that spoke  
>> to this problem specifically,  Most of the literature,  
>> understandably, is focused on coming up with better algorithms.
>>     
>
> There are two distinct things you might be talking about
> here.  The simpler thing is that you might be interested
> in average amortized performance, in which case the usual
> approach is to repeat the benchmark hundreds of times in
> the same execution and take the average.
>
> On the other hand, you might be looking for some measure
> of the collector's ability to distribute gc time over
> execution well enough to avoid impacting the mutator's
> responsiveness.  A search at Google Scholar on "garbage"
> and "minimum mutator utilization" will get you started
> on that topic.
>
> Will
>
>   
Thanks for you reply Will,

I had come up with a simplistic algorithm for amortizing the 
performance, which involved
execution of one function in the benchmark over the full input data set, 
that comprised
lists of increasing complexity where I measure the complexity with a 
simple metric
that equals the sum of all pairs? and atoms? in a list.  I collect the 
sum of time of execution
for some appropriately large number of iterations, typically a thousand, 
which would give me
a tuple of data for each function and list of known complexity, where 
the tuple
included the time of execution as well as the GC time encountered for 
that data point.
Then the idea was that I could use these tuples along with the sum of 
complexity over all
lists processed as well as the sum of GC time expended over the entire 
data set, to
distribute the GC time back to a data point in relationship to the ratio 
of the list complexity
for that data point to the total complexity.  But when I thought about 
it, this would really
only amount to lowering the execution time minus gc time curve, since by 
definition, I'd
be distributing the gc time evenly across the data points.  And it can't 
be the case that a
functions performance is fairly represented by eliminating design 
differences that do a
better or worse job at coping with the underlying GC's behavior.

But I was looking for a metric which is commonly accepted, so if a 
simple average over
many iterations is pretty much the standard, then thats what I'll do.

Your second thought on what I might have been asking is interesting, but 
my immediate
question was how to equitably benchmark functions on the same 
implementation against
each other.  However, it would be interesting, for cross platform 
comparisons to have
some metric on how well a collector does at distributing gc time, 
because as I've come to
find out, on the few implementations I've benchmarked, this metric is 
widely variable.
So much so, that I got results back from one compiler that weren't 
monotonically increasing
with list complexity, and as you state it did indeed become more erratic 
as the list got more
and more complex, until finally the ability of the heap to expand was 
exhausted, even though
I know there was free memory to be had.  It came down to the load factor 
you speak of,
which the implementation wanted to keep at no more than 50% after a GC, 
which meant
they had to ask for a much larger chunk of memory than actually 
required.  So where as
MzScheme was capable of handling lists of complexity = 3.86 x 10^8, the 
compiled version
crashed and burned at lists exceeding complexity = 1.93 x 10^8.  After 
graphing the
compiled versions data from the benchmark, the overall graph didn't look 
believable
since the executions times would actually go down in places even though 
the complexity had
increased, because the prior data point had an exaggerated execution 
time secondary to the
excessive GC time spent at that point.

Its clear now that my initial suspicion that this was a non-trivial 
problem was correct.  You
can only really make empirically based statements about a functions 
performance when you
restrict the test data to a size sufficiently small enough that you 
don't have significant side
effect due to the GC, and its almost fruitless to compare performance 
across implementations
that have different GC's, since they'll each have a different range of 
heap size that they can
accommodate before they significantly impact the run times reported for 
a function.

I very much appreciate your time to reply.   I got my answer,  and more.

--kyle
airfoil at bellsouth dot net
schemekeys.blogspot.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20070629/9780cfe9/attachment.html>

Posted on the users mailing list.