[racket-dev] package-system update

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Sun Jul 14 08:42:26 EDT 2013

Thanks! (For one, I found the "From Back There to Here" section
particularly helpful.)

On Sat, Jul 13, 2013 at 1:56 PM, Matthew Flatt <mflatt at cs.utah.edu> wrote:

> Here's a big-picture update of where we are in the new package system
> and the conversion of the Racket distribution to use packages.
> This message covers
>  - how I see things working after the package system and
>    reorganization is done, and a report on what pieces are still
>    missing to reach that vision;
>  - a look at how we got to our current design/reorganization choices
>    and whether we're choosing the right place; and
>  - speculation on why the package changes have been so difficult to
>    implement.
> All of that makes it a long message (sorry!), but I hope this message
> is useful to bring us more in sync.
> A Package-Based Racket
> ----------------------
> Let's take a look at how you'll do various things in the new
> package-based Racket world.
> (There's no new information here, and parts marked with "[guess]" are
> especially speculative.  Still, some details may be clearer than in
> earlier accounts, now that much of it is implemented, and I think a
> comprehensive review may be useful.)
> ** Downloading release installers from PLT
> The "www.racket-lang.org" site's big blue button will provide the same
> installers that it does now, at least by default. That is, the content
> provided by the installer --- DrRacket, teaching languages, etc. ---
> will be pretty much the same as now.
> The blue button might also provide the option of "Minimal Racket"
> installers, which gives you something that's a small as we can make it
> and still provides command-line `raco pkg'.
> ** Downloading installers from other distributors
> There are all sorts of reasons that the "main distribution" from PLT
> might not fit the needs of some group. Maybe the release cycle is too
> long or at the wrong time. Maybe it includes much too much, much too
> little, or almost the right amount but missing a crucial
> package. Maybe the group wants something almost minimal, but still
> with a graphical package manager. Maybe some group uses a platform for
> which PLT does not provide an installer.
> For many of those groups, using a "Minimal Racket" installer plus
> selective package installations will do the trick. For others,
> creating a special set of installers might be worthwhile, but there
> are too many reasons and too many permutations for PLT to provide
> installers that cover all of them.
> Fortunately, anyone can build a set of installers and put them on a
> web page, and we make it as easy as possible to build a set of
> installers that start with a given set of packages. PLT could host a
> web page or wiki that points to other distributors. PLT might even be
> able to provide an automated service that generates a set of
> installers for a basic set of platforms.
> ** Compiling a release from source
> In addition to installers, a download site can provide a source-code
> option (not specific to any platform, unlike the current source
> packages), which would mainly be used for building Racket on
> additional platforms.
> This option is mostly a snapshot of the source-code repository for the
> core, but it includes a pre-built "collects" tree (see "technical
> detail", below) and a default configuration that points back to the
> distributor's site for pre-built packages.
> ** Adding or upgrading supported packages
> In much the same way that you can easily install a set of supported
> packages on your current OS, you'll be able to easily install a set of
> packages that are supported by your distributor. Those packages are
> pre-built, so they install quickly, along with any included
> documentation.
> Depending on the distributor and installer, packages might be
> downloaded and installed in "binary" form, which means that tests and
> source code (for libraries and documentation) are omitted from the
> package. PLT seems unlikely to provide such installers in the near
> future.
> The default package scope configured by a distribution tends to be
> "user", which means that packages are installed in a user-specific
> location.
> Package updates can be made available by distributors for whatever
> reason and on whatever timetable see they fit.
> If your distribution is from PLT, then the supported packages are
> called "ring-0" packages. Ring-0 packages include contributions from
> third parties (i.e., not just packages implemented by PLT) that are
> vetted and regularly tested by PLT.
> [Guess:] The "Racket" and "Minimal Racket" distributions might point
> to different pre-built package catalogs. Possibly, the "Racket"
> catalog never updates packages that were included in the installer (on
> the grounds that the user may not have write permission to the
> install), while the "Minimal Racket" catalog includes more frequent
> updates for bug fixes (on the grounds that the user can update any
> installed package).
> A distributor doesn't necessarily have to provide its own package
> catalog. It can instead supply an installer that works with packages
> as served by some other distributor's catalog, such as PLT's
> catalog. (See "technical detail" below.)
> A user can also redirect `raco pkg' to a different catalog server,
> instead of using the configuration that was supplied by the
> installer. Binary, pre-built, and source variants of a package can be
> "updated" to each other in any direction.
> ** Adding or upgrading other packages
> An installer-provided configuration will normally point to a catalog
> of packages that are not specifically supported by the distributor but
> are still readily available --- probably mostly in source form and
> directly pulled from a git repository. In particular,
> "pkg.racket-lang.org" provides packages in source form.
> ** Reading documentation
> A distribution site provides online documentation (including all
> supported packages) alongside installers and packages.
> Many installers and packages include documentation to be installed on
> a user's machine, but there are some packages that provide libraries
> without documentation. For example, "gui-lib" provides GUI libraries
> without local documentation, while "gui" combines "gui-lib" local
> documentation and the libraries.
> Sometimes, documentation that is installed locally will still refer to
> documentation that is not downloaded. Such links are directed back to
> the distributor's site. That situation won't happen often for
> pre-built packages, because links that go to other packages will tend
> to go to packages that are dependencies. It will happen more for
> binary packages, because the dependency can be build-time only.
> ** Creating new packages
> A minimal package is a directory. So, let's suppose that you have some
> modules in a directory that you want to turn into a package. Suppose
> that your directory is called "potato", and it has module a file
> "eat.rkt".
> Turn your directory into a locally installed package with
>    raco pkg install --link potato
> Then, you can use "eat.rkt" with
>    (require potato/eat)
> To give your package to someone else, you could zip up the "potato"
> directory as "potato.zip", and the other person would install with
>    raco pkg install potato.zip
> Note that you can use any zip archiving tool, or you can use
>    rack pkg create --form-install potato
> to create the ".zip" file, which has the advantage that directories
> like "compiled" and ".git" are omitted.
> Even better, maybe your directory is already on GitHub at
> "http://github.com/idaho/potato". Then, others can install your
> package with
>    raco pkg install github://github.com/idaho/potato/master
> If you push changes to your GitHub repository, others can get them
> with
>   raco pkg update potato
> If you're ready for the world to use your package, then go to
> "pkg.racket-lang.org" and point the package name "potato" at your
> GitHub repository. Then, not only will others know about your package,
> they'll be able to install it with
>    raco pkg install potato
> Finally, if you'd like PLT to include your package as a pre-built
> package with each snapshot and release, then go back to
> "pkg-racket-lang.org" and request ring-0 status for the package.
> Ring-0 status may require a few bureaucratic improvements to your
> package, such as including an "info.rkt" file if you don't have one
> already, because those details are needed to keep your package in
> working order.
> ** Using the cutting edge
> PLT provides one or more snapshot sites that work the same as the
> release site, except that each snapshot's catalog expires after a few
> days. When that catalog goes away, you can continue to use the
> snapshot, but you'll have to get packages and updates via source.
> ** Using the bleeding edge
> A user who wants to work with the minute-by-minute latest can start by
> cloning the core Racket git repository, `configure', `make', and `make
> install' to get a Minimum Racket build. Then, start installing
> packages with `raco pkg'.
> The default package catalog in built-from-source Racket is
> "pkg.racket-lang.org", which means that you get all packages in source
> form from various git repositories, including for PLT-maintained
> packages. The default package scope is "installation".
> If you run `raco pkg update -a', then you likely get updates and
> trigger many compiles. Eventually, an update will fail, because your
> core Racket version is too old, and you'll need to `git pull',
> `configure', `make', and `make install' --- if you haven't been doing
> that, anyway. Since packages were added with installation-wide scope,
> `make install' rebuilds your previously installed packages, too.
> ** Using the bleeding edge as a PLT developer
> As a convenience to PLT developers, who tend to work on a particular
> set of packages, there is an alternate way of working on the bleeding
> edge (which anyone can use, if they prefer).
> [Guess #1:] Instead of cloning the core Racket repo, clone a "main
> distribution" repo that has the core Racket repo as a submodule, plus
> git submodules for each of the packages that are dependencies of
> "main-distribution". In other words, you get something that looks like
> the current Racket repo, but that uses git submodules.
> [Guess #2:] Instead of cloning the core Racket repo from GitHub, you
> clone from the "main distribution" repository, just like now. In
> addition to being mirrored to GitHub directly, individual parts of the
> "main distribution" repo are mirrored as GitHub repositories, and
> the mirrors are the ones that "pkg.racket-lang.org" references.
> GitHub repositories that correspond to packages (submodules in guess
> #1, mirrored subtrees in guess #2) are registered with
> "pkg.racket-lang.org", which is how users on the bleeding-edge might
> normally get the packages.
> ** Becoming a distributor
> If you want to create installers like PLT's, then it's simplest to
> clone the git repo like a PLT developer, and then use `make
> installers'.
> Alternatively, you can use `make installers-from-catalog' to create a
> set of installers based on packages pulled from a specified catalog.
> Either way, if you want to piggy-back on some other installer's set of
> pre-built packages, then configuration options and/or makefile targets
> to do that. (This is more sketchy; see below.)
> ** Taking your own snapshot of Racket and packages:
> Sometimes, you don't need to build installers, but you'd still like a
> snapshot of the current Racket core and package. You might want to
> edit the snapshot to upgrade some packages while keeping others the
> same.
> The `raco pkg catalog-copy' command is one of many tools to manipulate
> catalog servers. For packages that are mapped to GitHub repositories,
> merely copying a catalog doesn't archive the code, but it archives a
> particular commit id. It's always possible to grab a copy of a package
> repository and reference the copy from a catalog.
> A Technical Detail
> ------------------
> Starting from scratch twice with the same Racket sources does not lead
> to compatible pre-built packages, unfortunarely, because bytecode
> files are generated deterministically. Maybe we'll be able to fix
> that, one day.
> Meanwhile, pre-built packages depend on a particular build of the
> libraries in "collects", as well as a particular build of any
> dependencies. So, if a distributor wants to enable other distributors
> that use the same catalog of pre-built packages, the distributor must
> serve a "collects" tarball, too. Providing the "collects" will be
> built into the snapshot support.
> From Here to There
> ------------------
> The snapshot site
>    http://www.cs.utah.edu/plt/snapshots/
> demonstrates how a lot is working.
> Here are the remaining implementation issues:
>  * Generated distribution sites do not yet include a source code
>    option or "collects.tgz" for piggy-backing distributors, and the
>    makefile or configuration file lacks support for piggy-backing.
>    These seem straightforward to add.
>  * The PLT-maintained packages are not yet reflected on
>    "pkg.racket-lang.org".
>    Because all of those packages are currently in one big git
>    repository, it's not clear how to register the packages. Guesses #1
>    and #2 in "Developing Racket like PLT developers" above are two
>    possible routes. Another is that we set up a process to pull from
>    git and bundle package sources into individual zip archive that are
>    registered on "pkg.racket-lang.org".
>  * The `make installers' support needs to be less tied to
>    "main-distribution".
>    You can configure the set of packages that are built and included
>    in installers by `make installers', but that set currently must be
>    be a subset of the packages in the "pkgs" directory of the Racket
>    repository. It's easy in principle to pull the packages from a
>    catalog server, but there will be some issues to sort out in the
>    bootstrapping process and in ensuring a consistent snapshot.
>  * No support yet for generated distributions sites with binary
>    packages.
>    Probably not too difficult. I forget what went wrong last time I
>    tried this, but a lot has been fixed since then. In any case, the
>    idea of binary packages does not seem to have gained much traction.
>  * Package-dependency checking for tests.
>    Maybe it's just a matter of compiling tests sorting them into
>    suitable packages, like everything else, which is a direction that
>    we've already started.
>  * The "main-distribution" package needs to be cleaned up.
>    The "main-distribution" package currently inclues tests, and it
>    includes packages like "honu" that are not in the current release.
>    This clean-up task is related to sorting out tests.
>  * Different builds modes are not yet configured with different
>    default package scopes.
>    Should be easy.
> I also have a long-ish list of minor repairs and usability
> improvements to tackle.
> From Back There to Here
> -----------------------
> I think the big-picture plans are probably uncontroversial.
> When it comes to the details of exactly how things work and how things
> are named, I'm hearing less confidence or less agreement. Some of us
> are steeped in the issues and have different opinions. Others seem
> overwhelmed by the details, unsure of how it will all work out, and
> disconcerted by conflicting messages from others who seem to
> understand the issues. For people who are in that last group or close
> to it, it may seem overall that we're moving into a new package system
> too quickly.
> The decision to split Racket into packages has stressed our
> development process, because now we're tackling two hard problems
> instead of one: developing a package system and using it on a big pile
> of code. I think a good case could be made that the package system is
> too new to trust with a big shift. At the same time, my sense is that
> waiting until the package system is good enough isn't how software
> works; a piece of software becomes good enough for its job only when
> you make it do its job.
> From what I hear, the issues that make people uncomfortable fit into
> three categories:
>  * Package-system design
>  * Repository organization
>  * Concerns that a more distributed ecosystem means a less unified one
> Let's take them one at a time.
> ** Package-system design
> We all appreciate the work that Jay did to design the package
> system. I hear lingering concern about the design, including its
> limited support for versioning (just dependency checks), the fact that
> the package system is outside the module system (no built-in
> auto-download of packages, although a tool like DrRacket can suggest
> package installs in response to missing-library exceptions), its
> stance on conflicts (simply disallowed), and its flat namespace (which
> could make conflicts more frequent).
> On some of the points, I think reasonable people will disagree. We've
> had a years-long discussion, and we've been paying attention to
> precedents. We've explored some nearby alternatives to the current
> design (I'm thinking of single-collection versus multi-collection
> packages). I think we've gotten as close to consensus as possible.
> ** Repository organization
> As we try to split the Racket repository into packages, the questions
> concern how finely to split the repository and how to eventually
> allocate packages to source-code repositories.
> I think the initial split of the Racket repository went more smoothly
> than anyone expected. It was fairly easy, for example, to extract a
> relatively small core to run `raco pkg', or to draw a line between
> DrRacket and the teaching languages. I chalk that up to general
> competence among the Racket implementors: big systems must be
> developed in layers, whether the layers are declared or not.
> In fact, it has worked out so well that the splitting of Racket into
> packages has taken a more aggressive form than I expected. At this
> point, we've split the Racket repository into 137(!) packages, and
> that number is still growing. Two of us tried to make a coarser split,
> and it didn't feel right. Others have since started shuffling packages
> and continue to split things further. We seem to really like declaring
> dependencies and reducing unrequested functionality.
> Given that packages are going to be split finely, the question of
> allocating packages to repositories is less straightforward. We've
> concluded that "scribble-lib" and "scribble-doc" are good to have
> separate as separate packages, but we don't want Scribble's
> implementation and its documentation to end up in a separate
> source-code repositories. At the same time, putting everything in one
> big repository is intractable, at least at the point where we want
> packages downloaded directly from a repository. (A package can be a
> subdirectory of a repository, but the package manager has to download
> a tarball of the entire tree to extract the subdirectory.) So, under
> "pkgs", we have an extra layer in the directory hierarchy to reflect
> an intended organization into repositories. Using a layer of
> directories is consistent with git submodules, if we choose to go that
> way.
> The fact that many of us have tried and arrived at the same conclusion
> on granularity gives me confidence that it's a reasonable conclusion,
> but the current Racket repository organization really does feel
> complex. For example, the core of `raco setup' is
>    racket/lib/collects/setup/setup-unit.rkt
> while the Scribble part of `raco setup' is in
>    pkgs/racket-pkgs/racket-index/setup/scribble.rkt
> Those paths reflect that `raco setup' is mostly core functionality,
> but you don't get documentation setup until you install the
> "racket-index" package, which is currently grouped with other
> almost-core packages.
> This example also illustrates how the current organization relies on
> collection splicing in a big way. In the long run, not many
> collections are going to be spliced so much as, say, "racket" and
> "data", but splicing two or three times to separate modules,
> documentation, and tests may turn out to be common.
> And then there's
>    pkgs/drracket-pkgs/drracket/drracket/drracket.rkt
>              ^          ^         ^          ^
>             repo      package  collection  module
> Every layer before a "/" has multiple descendents, so they layers are
> not trivially collapsed. If you just look at the path, it seems
> crazy. But if you're expecting <repo>/<package>/<collection>/<module>,
> then hopefully it seems reasonable.
> In short, the current layout is driven by three factors: a bias toward
> fine-grained packages, a sense that it's good to reflect layers and
> dependencies via separate filesystem directories, and some constraints
> on how directories relate to git repositories. Unless we change those
> driving factors, I don't see us arriving at a simpler organization.
> ** Distributed versus unified ecosystem
> While less prominent than the other categories, I'm also hearing some
> concern that splitting up the Racket repository and reorganizing
> various pieces of infrastructure will lead to a less unified system
> --- or even a less unified community.
> Moving our products and infrastructure into a more distributed form is
> one of my main goals, but I don't think that "distributed" has to mean
> "fragmented". It seems to me that the more distributed we are able to
> make our world (the Internet, git, etc.), the more closely we are able
> to work together. The math behind that effect eludes me, but I believe
> in it, anyway.
> At the same time, the sudden emphasis on reorganizing the Racket
> repository could also give the impression that the new package system
> is primarily about distributing Racket, and not about "third-party"
> libraries and packages. I think we're trying to make our as much code
> as possible treated as "third-party", and thus ensure that all parties
> are well supported.
> Why Aren't We There, Yet?
> -------------------------
> We're hardly the first to design a package system or apply it to a big
> system, and I can't shake the sense most of the time that we're just
> reinventing the wheel. Along those lines, implementing the mechanics
> of the package system has been suspiciously difficult.
> I hope that part of the reason is our commitment to documentation ---
> that it exists, that it builds reliably, that it's richly formatted,
> and that it is pervasively cross-referenced and hyperlinked. I don't
> think that any package system delivers documentation that's anything
> like ours.
> Could it also be an unusual commitment to relative paths, especially
> when distribution pre-built items? A lot of problems go away if you
> know that the library is going to be in "/usr/local/lib".
> Surely part of it is trying to make `raco setup' fast for installing
> packages. It's complex and fragile to performing an incremental
> computation based on changes inferred from filesystem state.
> Bootstrapping, at least, is known to be tricky. The Racket compiler
> isn't written in Racket, yet, but the installer-creator installs
> Racket packages to create a local installation that is used to set up
> packages on a remote installation that runs a Racket script to build
> an installer. It took many days to make that work and make it
> configurable.
> On the plus side, `raco setup' can usefully check package dependencies
> and sort them into "build-time" and "run-time" dependencies, even for
> documentation links, and that checking was relatively easy to
> implement. Since module collection references can be synthesized at
> run time, there's no way to completely check dependencies statically,
> but I think we may end up with something that's more reliable and
> complete than checking in other package systems. If so, maybe that
> helps explain why it was hard.
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20130714/0b79c51e/attachment-0001.html>

Posted on the dev mailing list.