No subject

From: ()
Date: Mon Dec 3 19:58:15 EST 2012

three categories:

 * Package-system design
   
 * Repository organization

 * Concerns that a more distributed ecosystem means a less unified one

Let's take them one at a time.

** Package-system design

We all appreciate the work that Jay did to design the package
system. I hear lingering concern about the design, including its
limited support for versioning (just dependency checks), the fact that
the package system is outside the module system (no built-in
auto-download of packages, although a tool like DrRacket can suggest
package installs in response to missing-library exceptions), its
stance on conflicts (simply disallowed), and its flat namespace (which
could make conflicts more frequent).

On some of the points, I think reasonable people will disagree. We've
had a years-long discussion, and we've been paying attention to
precedents. We've explored some nearby alternatives to the current
design (I'm thinking of single-collection versus multi-collection
packages). I think we've gotten as close to consensus as possible.

** Repository organization

As we try to split the Racket repository into packages, the questions
concern how finely to split the repository and how to eventually
allocate packages to source-code repositories.

I think the initial split of the Racket repository went more smoothly
than anyone expected. It was fairly easy, for example, to extract a
relatively small core to run `raco pkg', or to draw a line between
DrRacket and the teaching languages. I chalk that up to general
competence among the Racket implementors: big systems must be
developed in layers, whether the layers are declared or not.

In fact, it has worked out so well that the splitting of Racket into
packages has taken a more aggressive form than I expected. At this
point, we've split the Racket repository into 137(!) packages, and
that number is still growing. Two of us tried to make a coarser split,
and it didn't feel right. Others have since started shuffling packages
and continue to split things further. We seem to really like declaring
dependencies and reducing unrequested functionality.

Given that packages are going to be split finely, the question of
allocating packages to repositories is less straightforward. We've
concluded that "scribble-lib" and "scribble-doc" are good to have
separate as separate packages, but we don't want Scribble's
implementation and its documentation to end up in a separate
source-code repositories. At the same time, putting everything in one
big repository is intractable, at least at the point where we want
packages downloaded directly from a repository. (A package can be a
subdirectory of a repository, but the package manager has to download
a tarball of the entire tree to extract the subdirectory.) So, under
"pkgs", we have an extra layer in the directory hierarchy to reflect
an intended organization into repositories. Using a layer of
directories is consistent with git submodules, if we choose to go that
way.

The fact that many of us have tried and arrived at the same conclusion
on granularity gives me confidence that it's a reasonable conclusion,
but the current Racket repository organization really does feel
complex. For example, the core of `raco setup' is

   racket/lib/collects/setup/setup-unit.rkt

while the Scribble part of `raco setup' is in

   pkgs/racket-pkgs/racket-index/setup/scribble.rkt

Those paths reflect that `raco setup' is mostly core functionality,
but you don't get documentation setup until you install the
"racket-index" package, which is currently grouped with other
almost-core packages.

This example also illustrates how the current organization relies on
collection splicing in a big way. In the long run, not many
collections are going to be spliced so much as, say, "racket" and
"data", but splicing two or three times to separate modules,
documentation, and tests may turn out to be common.

And then there's

   pkgs/drracket-pkgs/drracket/drracket/drracket.rkt
             ^          ^         ^          ^
            repo      package  collection  module

Every layer before a "/" has multiple descendents, so they layers are
not trivially collapsed. If you just look at the path, it seems
crazy. But if you're expecting <repo>/<package>/<collection>/<module>,
then hopefully it seems reasonable.

In short, the current layout is driven by three factors: a bias toward
fine-grained packages, a sense that it's good to reflect layers and
dependencies via separate filesystem directories, and some constraints
on how directories relate to git repositories. Unless we change those
driving factors, I don't see us arriving at a simpler organization.

** Distributed versus unified ecosystem

While less prominent than the other categories, I'm also hearing some
concern that splitting up the Racket repository and reorganizing
various pieces of infrastructure will lead to a less unified system
--- or even a less unified community.

Moving our products and infrastructure into a more distributed form is
one of my main goals, but I don't think that "distributed" has to mean
"fragmented". It seems to me that the more distributed we are able to
make our world (the Internet, git, etc.), the more closely we are able
to work together. The math behind that effect eludes me, but I believe
in it, anyway.

At the same time, the sudden emphasis on reorganizing the Racket
repository could also give the impression that the new package system
is primarily about distributing Racket, and not about "third-party"
libraries and packages. I think we're trying to make our as much code
as possible treated as "third-party", and thus ensure that all parties
are well supported.


Why Aren't We There, Yet?
-------------------------

We're hardly the first to design a package system or apply it to a big
system, and I can't shake the sense most of the time that we're just
reinventing the wheel. Along those lines, implementing the mechanics
of the package system has been suspiciously difficult.

I hope that part of the reason is our commitment to documentation ---
that it exists, that it builds reliably, that it's richly formatted,
and that it is pervasively cross-referenced and hyperlinked. I don't
think that any package system delivers documentation that's anything
like ours.

Could it also be an unusual commitment to relative paths, especially
when distribution pre-built items? A lot of problems go away if you
know that the library is going to be in "/usr/local/lib".

Surely part of it is trying to make `raco setup' fast for installing
packages. It's complex and fragile to performing an incremental
computation based on changes inferred from filesystem state.

Bootstrapping, at least, is known to be tricky. The Racket compiler
isn't written in Racket, yet, but the installer-creator installs
Racket packages to create a local installation that is used to set up
packages on a remote installation that runs a Racket script to build
an installer. It took many days to make that work and make it
configurable.

On the plus side, `raco setup' can usefully check package dependencies
and sort them into "build-time" and "run-time" dependencies, even for
documentation links, and that checking was relatively easy to
implement. Since module collection references can be synthesized at
run time, there's no way to completely check dependencies statically,
but I think we may end up with something that's more reliable and
complete than checking in other package systems. If so, maybe that
helps explain why it was hard.


Posted on the dev mailing list.