[plt-scheme] how much time it takes to compile scribbles to you?

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sat Apr 19 10:20:28 EDT 2008

At Sat, 19 Apr 2008 08:04:46 +0200, "Marco Maggi" wrote:
> once again I tried to install 3.99 with CGC, this time I went a
> little further but still no full success. Everything goes
> fine with
> "make", and "make install" installs the libraries, "mzscheme",
> "mred", the collections, but then goes down the hole scribbling
> documentation.
> 
> It takes, literally, hours and finally it dumps core.
> 
> Is it normal that scribbling takes so much time? Or is there
> some monotonically increasing f* up going on because of CGC? 

Yes, I think CGC is to blame, and those long hours were spent paging.

I haven't tried setup-plt with CGC in a long time, and now that I try
it, I was not able to run a full setup-plt build using CGC. My machine
is an Intel Mac with 1.25 GB of memory.

With CGC, Memory use creeps up to 370MB while building bytecode.
Running Scribble documents (not yet rendering) pushes memory use to
490MB overall for the first pass. The first rendering pass ends with
680MB. The second pushes memory use on up to 730MB. And then ProfJ
starts compiling Java code, and memory use grew more. At 900MB, my
machine started paging (CPU use dropped to 4%), so I killed it.

The Scribbling part is probably especially bad because some documents
evaluate examples while building docs, and each evaluation uses a
sandbox, which means that new threads and namespaces are being created
all the time --- and MzScheme threads are particularly troublesome for
CGC.


With 3m, peak memory use for setup-plt is around 230 MB; it oscillates
between about 50MB and 150MB for most of `setup-plt'. In the Scribbling
phase, it runs more between 150MB and 200MB, peaking while rendering
the big reference manual.


Unfortunately, the difference between 3m and CGC is consistent with
everything we know about GC. CGC fares poorly on long-running
applications that have lots of internal variety, and as Setup PLT takes
on more and more installation jobs, it runs longer and with more
variety.

One way around the problem may be to just restarts setup-plt when it
starts Scribbling, so that you start that phase with much less
accumulated garbage.

The other way around is to use 3m. Are you using CGC only because 3m
isn't supported for your platform, in which case we should try to
support it?


Meanwhile, you may wonder why it takes 200 MB to render a manual to
HTML, or 150MB simply to generate bytecode, even with a GC that works.
I think it has to do with the huge amount of information stored with
syntax objects: lots of information is accumulated through layers of
macro expansion, each of which picks up a pile lexical-context
information. Probably the whole thing could run in half the memory;
I've cut memory use in half in the past through painstaking work to
track down an inefficient data structure, and I'll keep investigating.
A more substantial improvement probably requires fundamentally new
macro technology.


Matthew



Posted on the users mailing list.