<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> William D Clinger wrote: <blockquote cite="mid:E1I4DSg-0001c3-1Y@adara.ccs.neu.edu" type="cite"> <pre wrap="">Kyle Smith wrote: </pre> <blockquote type="cite"> <pre wrap="">With the adoption of generational GC's, the non-trivial problem of distributing GC time in benchmarking results is less of a factor. However, there remain Scheme implementations that have not yet implemented modern garbage collection algorithms, and that present a non-trivial problem for benchmarking comparative function execution time. </pre> </blockquote> <pre wrap=""> Benchmarking of garbage collectors is very difficult because there are so many things that should be held constant, but the collectors you want to benchmark are so different that they cannot be controlled in ways that would maintain those constants. </pre> <blockquote type="cite"> <pre wrap="">The problem becomes quite significant when as the percentage of reachable data in the heap increases, while the available memory decreases. </pre> </blockquote> <pre wrap=""> We call this the load factor (reachable data divided by total memory in use) or inverse load factor (total memory in use divided by reachable data). Just about any collector will be reasonably efficient (in an amortized sense) at low load factors, so you're mainly interested in the high load factors where benchmarking is most difficult. </pre> <blockquote type="cite"> <pre wrap="">I make these statements based on my own experience with attempts at benchmarking two different implementations that suffer this problem. In order to produce credible results, I need to account for the GC time in an equitable fashion, that would ideally distribute the GC in a temporal manner over real time of execution. After an ACM search, I didn't find anything that spoke to this problem specifically, Most of the literature, understandably, is focused on coming up with better algorithms. </pre> </blockquote> <pre wrap=""> There are two distinct things you might be talking about here. The simpler thing is that you might be interested in average amortized performance, in which case the usual approach is to repeat the benchmark hundreds of times in the same execution and take the average. On the other hand, you might be looking for some measure of the collector's ability to distribute gc time over execution well enough to avoid impacting the mutator's responsiveness. A search at Google Scholar on "garbage" and "minimum mutator utilization" will get you started on that topic. Will </pre> </blockquote> Thanks for you reply Will, I had come up with a simplistic algorithm for amortizing the performance, which involved execution of one function in the benchmark over the full input data set, that comprised lists of increasing complexity where I measure the complexity with a simple metric that equals the sum of all pairs? and atoms? in a list.  I collect the sum of time of execution for some appropriately large number of iterations, typically a thousand, which would give me a tuple of data for each function and list of known complexity, where the tuple included the time of execution as well as the GC time encountered for that data point. Then the idea was that I could use these tuples along with the sum of complexity over all lists processed as well as the sum of GC time expended over the entire data set, to distribute the GC time back to a data point in relationship to the ratio of the list complexity for that data point to the total complexity.  But when I thought about it, this would really only amount to lowering the execution time minus gc time curve, since by definition, I'd be distributing the gc time evenly across the data points.  And it can't be the case that a functions performance is fairly represented by eliminating design differences that do a better or worse job at coping with the underlying GC's behavior. But I was looking for a metric which is commonly accepted, so if a simple average over many iterations is pretty much the standard, then thats what I'll do. Your second thought on what I might have been asking is interesting, but my immediate question was how to equitably benchmark functions on the same implementation against each other.  However, it would be interesting, for cross platform comparisons to have some metric on how well a collector does at distributing gc time, because as I've come to find out, on the few implementations I've benchmarked, this metric is widely variable. So much so, that I got results back from one compiler that weren't monotonically increasing with list complexity, and as you state it did indeed become more erratic as the list got more and more complex, until finally the ability of the heap to expand was exhausted, even though I know there was free memory to be had.  It came down to the load factor you speak of, which the implementation wanted to keep at no more than 50% after a GC, which meant they had to ask for a much larger chunk of memory than actually required.  So where as MzScheme was capable of handling lists of complexity = 3.86 x 10^8, the compiled version crashed and burned at lists exceeding complexity = 1.93 x 10^8.  After graphing the compiled versions data from the benchmark, the overall graph didn't look believable since the executions times would actually go down in places even though the complexity had increased, because the prior data point had an exaggerated execution time secondary to the excessive GC time spent at that point. Its clear now that my initial suspicion that this was a non-trivial problem was correct.  You can only really make empirically based statements about a functions performance when you restrict the test data to a size sufficiently small enough that you don't have significant side effect due to the GC, and its almost fruitless to compare performance across implementations that have different GC's, since they'll each have a different range of heap size that they can accommodate before they significantly impact the run times reported for a function. I very much appreciate your time to reply.   I got my answer,  and more. --kyle airfoil at bellsouth dot net schemekeys.blogspot.com </body> </html>