[racket-dev] submodules

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Mar 7 12:14:35 EST 2012

I've added "submodules" to a version of Racket labeled v5.2.900.1
that's here:

 https://github.com/mflatt/submodules

After we've sorted out any controversial parts of the design and after
the documentation is complete, then I'll be ready to merge to the main
Racket repo.


Why Submodules?
---------------

Using submodules, you can abstract (via macros) over a set of modules
that have distinct dynamic extents and/or bytecode load times. You can
also get a private communication channel (via binding) from a module
to its submodules.

Some uses:

 * When you run a module via `racket', if it has a `main' submodule,
   then the `main' module is instantiated --- but not the `main'
   submodules of any other modules used by the starting module.  This
   protocol is implemented for `racket', but not yet for DrRacket.

 * Languages with separate read-time, configure-time, and run-time
   code can be defined in a single module, with the configure-time and
   read-time code in submodules.

 * A testing macro could collect test cases and put them into a
   separate `test' submodule', so that testing code is not run or even
   loaded when the module is used normally.

 * An improved `scribble/srcdoc' can expose documentation through a
   submodule instead of through re-expansion hacks.

 * If you want to export certain of a module's bindings only to when
   explicitly requested (i.e., not when the module is `require'd
   normally), you can export the bindings from a submodule, instead.

When I first started talking about these problems last summer, I
called the solution sketch "facets" or "modulets", but the design
has evolved into "submodules".


Nesting `module'
----------------

Given the term "submodule", the first thing that you're likely to try
will work as expected:

  #lang racket/base

  (module zoo racket/base
    (provide tiger)
    (define tiger "Tony"))

  (require 'zoo)

  tiger

Within `module', a module path of the form `(quote id)' refers to the
submodule `id', if any. If there's no such submodule, then `(quote
id)' refers to an interactively declared module, as before.

Submodules can be nested. To access a submodule from outside the
enclosing module, use the `submod' module path form:

  #lang racket/base

  (module zoo racket/base
    (module monkey-house racket/base
      (provide monkey)
      (define monkey "Curious George"))
    (displayln "Ticket, please"))

  (require (submod 'zoo monkey-house))

  monkey

The 'zoo module path above is really a shorthand for `(submod "."
zoo)', where "." means the enclosing module and `zoo' is its
submodule. You could write `(submod "." zoo monkey-house)' in
place of `(submod 'zoo monkey-house)'.

Note that `zoo' and `monkey-house' are not bound as identifiers in the
module above --- just like `module' doesn't add any top-level
bindings. The namespace of modules remains separate from the namespace
of variables and syntax. Along those lines, submodules are not
explicitly exported, because they are implicitly public.

When you run the above program, "Ticket, please" is *not* displayed.
Unless a module `require's a submodule, instantiating the module does
not instantiate the submodule. Similarly, instantiating a submodule
does not imply instantiating its enclosing module.

Furthermore, if you compile the above example to bytecode and run it,
the bytecode for `zoo' is not loaded. Only the bytecode for the
top-level module and `monkey-house' is loaded.


Nesting `module*'
-----------------

Submodules declared with `module' are declared locally while expanding
a module body, which means that the submodules can be `require'd
afterward by the enclosing module. This ordering means, however, that
the submodule cannot `require' the enclosing module. The submodule
also sees no bindings of the enclosing module; it starts with an empty
lexical context.

The `module*' form is like `module', but it can be used only for
submodules, and it defers the submodule's expansion until after the
enclosing module is otherwise expanded. As a result, a submodule using
`module*' can `require' its enclosing module, while the enclosing
module cannot require the submodule.

A ".." in a `submod' form goes up the submodule hierarchy, so that
`(submod "." "..")' is a reference to the enclosing module:

  #lang racket/base

  (module aquarium racket/base
    (provide fish)
    (define fish '(1 2))

    (module* book racket/base
      (require (submod "." ".."))
      (append fish '(red blue))))

  (require (submod 'aquarium book))

Instead of `require'ing its enclosing module, a `module*' form can use
`#f' as its language, in which case its lexical context starts with
all of the bindings of the enclosing module (implicitly imported)
instead of with an empty lexical context. As a result, the submodule
can access bindings of the enclosing module that are not exported:

  #lang racket/base

  (module aquarium racket/base
    (define fish '(1 2))

    (module* book #f
      (append fish '(red blue))))

  (require (submod 'aquarium book))

A common use of `module*' is likely to be with `main', since `racket'
will load a `main' submodule (after `require'ing its enclosing module)
for a module named on its command line. For example, if you run this
program via `racket':

  #lang racket/base

  (provide fish)
  (define fish '(1 2))

  (module* main #f
    (unless (apply < fish)
      (error "fish are not sorted")))

then you get a "fish are not sorted" error, but if you `require' the
file into another program, you get a `fish' binding with no error.


The new `#lang'
---------------

The `#lang' reader form was previously defined as a shorthand for
`#reader' where the name after the `#lang' is mangled by adding
"/lang/reader".  With submodules, `#lang' first tries using the name
as-is and checking for a `reader' submodule; if it is found, then the
submodule is used instead of mangling the name with "/lang/reader",
otherwise it falls back to the old behavior.

So, if you want to define an `ocean' language that is `racket/base'
plus `fish', it's enough to install the following module as "main.rkt"
in an "ocean" collection:

  #lang racket/base

  (provide (all-from-out racket/base)
           fish)
  (define fish '(1 2 3))

  (module reader syntax/module-reader 
    #:language 'ocean)


Backwards Incompatibility
-------------------------

The biggest incompatibility is that `resolved-module-path-name' can
return a list when the module path refers to a submodule, in addition
to the old path and symbol results. Most code that calls
`resolved-module-path-name' will have to be updated.

The `submod' form is a new primitive module-path form, so module name
resolvers also must be updated.  Finally, a load/use-compiled handler
must accept a list as the expected-module name, which usually
indicates that a submodule is being loaded; the list can start with
`#f' to indicate that the module should only be loaded if it can be
loaded independently from bytecode (i.e., without triggering the
declaration of any other submodule, which means not loading from
source). Furthermore, when a submodule is requested, no error should
be raised if the enclosing module is unavailable, which allows
speculative checking for submodule declarations.

The bytecode format has changed, and the `mod' structure type from
`compiler/zo-parse' has two new fields: one for "pre" submodules
(i.e., those declared with `module') and one for "post" submodules
(i.e., those declared with `module*'). Any code that uses
`compiler/zo-parse' will have to change.

If you compile a `module' form and it has submodules, then when you
write the bytecode, all of the modules are written together. If the
`module' is not inside a larger top-level sequence, then the printed
form starts with a table that can be used to find any individual
submodule, which is how independent loading of submodules works. If
you just `read' the table in, though, it returns a compiled-module
value that contains submodules, and `eval'ing the compiled module
declares all the submodules, too. This protocol makes lots of
`compile' and `eval' code work without modification. The
`get-module-code' function from `syntax/modcode', meanwhile, gives you
more control, along with functions like module-compiled-submodules' to
get or adjust the submodule list in a compiled-module value.


Design Issues
-------------

The `submod' syntax --- especially "." and ".." --- is arbitrary. The
`submod' name isn't great, but I like it the best among the options
that I tried.  I'm not sure whether the association of "."  and ".."
to filesystem paths is helpfully mnemonic or unhelpfully
confusing. The handling of `quote' paths within a module is also
arbitrary, but it's intended to smooth the connection between the top
level and a module body.

Overloading `module' for submodules is questionable; again, though, I
like how it roughly matches interactive evaluation. For the
post-submodule form, then, `module*' seems like the obvious
choice.

As things stand, the ugly pattern `(module* main #f ...)'  would be
common. Probably we should have a macro that expands to `(module* main
#f ...)'. Should the macro be called `main'?

I haven't tried to build a test-collecting macro or a
`scribble/srcdoc' replacement. I think they will work with this
submodule design, but I can't be sure until we try it.


Posted on the dev mailing list.