[plt-dev] Re: problem with optimistic compilation

From: Faré (fahree at gmail.com)
Date: Thu Aug 13 09:14:24 EDT 2009

All of these issues would be solved if mzscheme had a *pure* approach
to managing compilation state: a given pathname, if naming an actual
file, should only possibly contain a uniquely identified content.

Example implementation (used by OMake?): the name of the object file
is a hash based on the name of the source file, the hash of its
contents, the hash of any other dependency read during compilation,
the hash of the implementation itself.

Simpler approximation (used by cl-launch): include the directory name
for your object cache the version of the implementation, the
architecture, the path of the source file, etc., and otherwise use
timestamps to detect that source files have changed.

Note that with a simple atomicity trick (atomic rename of a temporary
file) you also solve all the concurrency issues (e.g. NFS home
directory in which several versions of DrScheme run on multiple
machines try to share their cache).

Functional purity can make whole categories of problems disappear. Whodathunk?

[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ]
When a pamphlet was published entitled "100 Authors Against Einstein", Einstein
retorted "If I were wrong, one would be enough."

2009/8/13 Matthew Flatt <mflatt at cs.utah.edu>:
> At Wed, 12 Aug 2009 18:49:34 -0500, Robby Findler wrote:
>> On Wed, Aug 12, 2009 at 4:21 PM, Sam TH<samth at ccs.neu.edu> wrote:
>> > What if mzscheme-compiled files already exist?  Does it use those?
>> Yes. With preference given to the drscheme ones.
>> > If
>> > it doesn't, won't it recompile the entire collects tree the first time
>> > you try to do anything?
>> No, the "don't save any compiled files for anything in the collects
>> tree" caveat is still in place, as before.
>> > Also, this seems like it's papering over the problem.  We should be
>> > able to come up with a solution that works for both DrScheme and
>> > mzscheme.
>> I'll leave this one to Matthew to try to explain. I think you're
>> probably wrong, but I'm not sure precisely why. It is a big hairy
>> mess.
> And I'm not sure precisely why, either. This one of those areas where I
> suffer from the same affliction as many sysadmins: to users, there are
> many changes that would be obviously better, but I've been burnt by so
> many obvious changes that I'm reluctant to try more. Lots of competing
> demands have been balanced through a slow evolution, and then lots of
> other strategies and techniques evolve to fit that design; I'm leary of
> re-living it all for a different design point.
> Here's an attempt to list relevant issues and competing demands on the
> general issue of source and compiled files:
>  * Timestamps allow relatively efficient tracking of dependencies, but
>   timestamps are also somewhat fragile.
>  * There's a significant run-time cost to checking timestamps and/or
>   following a search path to locate the "best" version of a file.
>  * Search paths and other rules that can "fail" silently create
>   mystery. (Why does my program take so long to load? Oh, I need to
>   recompile file X.... But it is compiled, and the timestamp is later!
>   Oh, it's apparently the wrong version.)
>  * Some directories are writable and some are not. Some directories are
>   writable but aren't really intended to be modified.
>  * Sometimes `mzscheme' is used in development mode and sometimes in
>   execution mode. (Recall how we eventually learned that defaulting
>   development mode and requiring "-q" for execution mode was a bad
>   idea.)
>  * Sometimes files don't exist for required modules, even if they're
>   named through collection paths. (I have in mind the modules that are
>   in a `"stand-alone" executable.)
>  * Programs might be loaded concurrently on the same filesystem, and
>   synchronizing them is a pain, at best. (Are we at least past the
>   days of NFS, where you don't get the normal filesystem atomicity
>   guarantees?)
>  * Different versions might be used.
>  * Different compilation options might be used, and they may allow
>   different development and/or deployment possibilities. (For example,
>   `enter!' can re-load changed modules only when they are compiled
>   with `compile-enforce-module-constants' set to #f, but that same
>   setting has a negative effect on performance.)
>  * Sometimes different compilation options are used in the same
>   program, and the user needs some control over which modules use
>   which options. (Currently, our tools choose to do X or Y based on
>   whether a compiled file exists for a module.)
>  * Using multiple namespaces or other parameter-based configurations
>   can easily collide.
>  * Sometimes you want to refer to compiled files outside of PLT Scheme.
>   (For example, I often write makefiles that use `mzc' and then
>   trigger other actions based on the timestamp of the compiler file
>   --- and that would be more difficult if the compiled-file path were
>   version-specific, though maybe I should not be doing that in
>   makefiles.)
>  * When version-specific files are generated for and users upgrade
>   frequently, the filesystem can become littered with useless files
>   from old versions. (This bugs me about Planet, but it's all on one
>   place, so I can clean up easily enough.)
>  * Although there are many cases where changing a module forces
>   recompilation of importing module, there are also many cases where
>   the new module can be used from source without recompiling
>   everything that depends on it.
>  * Bootstrapping is tricky. (I sometimes get into trouble by using
>   `mzc' after I change a file in "collects/scheme" without recompiling
>   everything. Because of the way that compilation hooks into the
>   module-loading process, and because `mzc' itself uses the changed
>   files, running `mzc' multiple times doesn't converge in a nice way.)
>  * Some people want to distribute bytecode files without source.
> I'm sure the list is incomplete, but that's all I can think of. In any
> case, if someone believes that the current approach to bytecode files
> is fundamentally wrong, the above list may be useful. My sense is that
> a better approach is out there, but that it's only slightly better and
> not worth the effort to get from here to there; then again, that sense
> is based more on vague impressions from (limited) experience, rather
> than any solid technical argument.
> On the specific issue of how the changes in DrScheme relate to
> MzScheme, though, I agree that the DrScheme capability should be
> available in MzScheme in some sort of development mode. As Robby noted,
> you can get `mzc'-like automatic compilation of files by adding
>  (require compiler/cm)
>  (current-load/use-compiled
>  (make-compilation-manager-load/use-compiled-handler))
> to your ".mzschemerc". I think it would make sense to have DrScheme's
> extra tools (i.e., to compile only sources not in the main "collects")
> similarly available in library form.

Posted on the dev mailing list.