[racket-dev] proposal for moving to packages

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Mon May 20 16:42:15 EDT 2013

I used to think that we'd take advantage of the package manager by
gradually pulling parts out of the Racket git repo and making them
packages.

Now, I think we should just shift directly to a small-ish Racket core,
making everything else a package immediately. "Core" means enough to
run `raco pkg'.

A key point to remember is that "package" does not mean "omitted from
the distribution". Instead, we'll construct a distribution by
combining the core with a selected set of packages. Initially the
selected set of packages will cover everything in the current
distribution.

Jay and I have been lining up the pieces for this change (it's
difficult to make a meaningful proposal without trying a lot of the
work, first), and I provide a sketch of the overall plan below.

This plan has two prominent implications:

 * The current git repo's directory structure will change.

   Anyone who currently works with the Racket repo will need to adapt
   to the new directory structure (and probably git <submodules in the
   future). All of the code currently in the Racket git repo will stay
   there (for now), but using it will involve at least one new step:
   linking packages within the repo into the core build --- probably
   by running some setup script.

 * The main Racket distributions at http://racket-lang.org/download/
   will omit sources, including ".rkt" files, ".scrbl" files, and
   tests.

   Sources will remain readily available through the git repo and
   through the package manager, but getting users to try a source-code
   change will be less convenient than now. See "Binary Builds" below.


Repository Reorganization
-------------------------

To convert the current monolith into a core plus packages, we propose
to reorganize the Racket git repository by

 1. pushing the current content into a "core" subdirectory, and

 2. lifting pieces back out of "core" and into new subdirectories, one
    for each package.

The resulting repo will have top-level directories with names like
"core", "scribble", "gui", "slideshow", "drracket", and so on. Each
directory other than "core" corresponds to a package.

We'll have to try this out to discover how finely we can break up the
existing tree into packages. At worst, the "mr", "dr", and "plt"
layers of "dist-specs.rkt" should work, but I think we'll be able to
do better than that.

Eventually, when the dust settles, I think we'll want to convert every
directory to its own git repo, and then we can incorporate the
individual repos as git submodules.

Rearranging the repo will obviously break the current build
system. Jay and I are creating a new build system, so the current
nightly build and distribution processes do not need to adapt
(although we're using many existing pieces). The new build system
might be ready by the end of the week (and the repo reorganization
will wait until the build system is ready).


Binary Builds
-------------

The proposed switch to binary distributions --- instead of always
including source alongside generated bytecode and documentation --- is
aimed at reducing dependencies between packages. Support for binary
packages is also aimed at supporting faster installs.

In terms of dependencies, documentation for a library usually has more
dependencies than the library itself. We don't want to limit the
*documentation* for package X to avoid using or referring to package Y
libraries in order to avoid a run-time dependency of X on Y. For that
matter, we don't want to avoid documenting X in order to avoid a
dependency on Scribble. A library's tests similarly could have
dependencies that are not needed for the library itself.

We've adjusted `raco setup' and `raco pkg' to work with collections
and packages that are in binary form. "Binary" is not a specific
attribute of a package; it's just a package that happens to have ".zo"
files without corresponding ".rkt" files, documentation without
".scrbl" sources, and so on. The intent is not for programmers to
create binary packages, but to enable an automatic conversion of a
source package to a binary package. We can then set up different
catalog severs to serve source and binary versions of
packages. Finally, we'll be able to quickly create distributions ---
either the standard one or others --- by combining a core build with a
set of binary packages.

Some drawbacks to omitting source are immediately apparent:

 - Users will be less able to make source changes on their systems to
   help us debug.

   Having the binary form of a package installed does not preclude
   "upgrading" to a source package. So, we could ask a user to use the
   package manager to install the source form of, say, the "drracket"
   package, and then try out a change. In that way, users can still
   help, but it will be less convenient.

 - Users will be less able to read installed code as examples.

   Our source code is now easily available via the web interfaces at
   http://git.racket-lang.org/ and GitHub, so users can always look
   there, instead.

It would be possible, of course, to support distributions and packages
that include both source and compiled forms (like our current
distribution), but that arrangement requires even more work. We'd like
to try out the simpler source vs. binary options, first.


More Detail
-----------

Here's our plan for a new repo and build process:

 * There's a Racket core, which will look a lot like this:

     https://github.com/mflatt/min-racket

   The core is intended to provide everything that is needed to run
   `raco pkg', which is the way to install anyything else. The repo
   above is not yet minimal in that sense, but I think it's close.

   After `make install', a simple tool (probably a new mode for
   `setup/unixstyle-install') can copy a built tree to distribution
   form. The copy strips sources for which bytecode files exist.

   The core has no documentation or even documentation sources, and
   dropping sources for a distribution includes dropping "tests"
   subdirectories.

   In the distribution copy, the default package catalog is switched
   to a binary package server, instead of a source package
   server. (Developers can continue to work with an in-place build in
   package-source mode, as usual.)

 * For each package to be included in the distribution, a build machine
   installs and strips each package to binary form, where "binary" form
   means that sources are removed while bytecode is kept, etc.

   A package's `deps' in "info.rkt" should describe all run-time
   dependencies. Additional build-time dependencies must be specified
   by `build-deps', so the complete set of dependencies when building
   a package from source is `deps' plus `source-deps'. (In the long
   run, we can add machinery to check these dependency declarations.)
   Stripping to binary form adjusts "info.rkt": the `build-deps' entry
   is dropped, and the `scribblings' entry is rewritten to
   `rendered-scribblings' to install pre-built documentation, and so
   on. Fields in "info.rkt" can fine-tune the stipping process for a
   package or collection. A package must not depend on anything in
   another package that is stripped away for binary form.

   Rendered documentation for a binary package redirects to a server
   for any link that goes outside the document. Every package's
   documentation therefore makes sense by itself, so it is easy to
   include rendered documentation in a package; meanwhile, the
   documentation server will be populated with built packages. When a
   package is installed with its documentation, links are redirected
   to local copies for any documentation that is locally installed.

 * On the N platforms for which we want to provide distributions, we
   build the core, install binary packages (i.e., the ones to be
   included in the distribution), and then convert to an installer.

   There should be no need for "dist-specs". The distinction between
   source and binary is mostly implicit and otherwise specified
   per-package as part of package stripping. A binary installer is
   packed directly from a binary installation.

   The build and installer-creation scripts will themselves be
   packages, so anyone can run them. Also, we hope to eventually offer
   a service takes takes a set of packages and produces a set of
   installers that are preconfigured with the given packages.

   As a minor point, platform-specific packages must be created in
   binary form in the first place (i.e., there is no source form),
   since the creation of binary packages from source form will happen
   only on one platform. For example, the current `make'-time download
   of GUI libraries on Windows and Mac OS X will turn into
   platform-specific package dependencies, and the packages are
   straightforwardly created as binary in the first place.

 * There's just one core source distribution --- not different ones for
   Unix, Windows, and Mac --- that is derived directly from the git
   repo.

   We envision no distribution that includes both source and compiled
   bytecode/documentation, which is the form that our current
   distributions take. We could conceivably support such distributions
   one day, either by building from source on N machines or having a
   third kind of package that includes both source and compiled parts,
   but this is a good place to simplify at first.


Some pieces yet to be implemented:

 - Stripping a core build to prepare for binary package installs
   (looks easy).

 - Submodule stripping when converting a package to binary form (looks
   easy).

 - Scripts and servers to drive (1) the core and package build once,
   and (2) the core builds, package installs, and installer bundles on
   various platforms.

Some complexities of the current build/bundle process that go away:

 - No "dist-specs"; no mz/mr/dr/plt spec.

 - No "info-domain" fixup when packaging a distribution.

 - No extracting of binaries from one platform an splicing them into a
   generic build shell (or construction of the generic build shell).

 - No "src" distribution variants.

Some complexities that stick around:

 - DESTDIR mangling and `setup/unixstyle-install' shuffling.

 - Some process for taking a pile of installers and putting them on a
   web page.

 - DrDr builds and tests from source, including ring-0 packages.


Posted on the dev mailing list.