[racket-dev] Planet and Packages (was Re: PLaneT(2): Single vs multi-collection packages)

From: Sam Tobin-Hochstadt (samth at ccs.neu.edu)
Date: Thu Jul 4 14:02:37 EDT 2013

Neil,

You clearly put a bunch of thought into this email, so I think it
needs a response. I've changed the subject to put this in a new
thread.

On Fri, Jun 14, 2013 at 10:48 PM, Neil Van Dyke <neil at neilvandyke.org> wrote:

> For all of my packages, as well as any package I can imagine, I think that
> the original PLaneT got many things right or close to right:

Some of these features the new package system has:

> * Files from a package-version end up grouped together in the directory
> structure, specific to that package-version, and certainly not mixed into
> directories with files from package-versions of different package-names.

While collections now "splice", this is still true.  When I install a
package, it creates a new directory and everything goes there. For
example, 'raco pkg install -i gcstats' produces the directory
`/home/samth/sw/plt/racket/lib/pkgs/gcstats`.

> * Flat namespace (let's ignore the PLaneT package-owner part for now),
> without attempt to name packages according to some topical ontology.

This is still true.

> * Metadata in "info.rkt".

Also still true.

> * Some kind of unique package-name controlled by a developer.

This is true to the same degree it was with Planet. That is, if you
want to register your package at `pkg.racket-lang.org`, then you have
to have a unique name. Otherwise, you can locally install whatever you
want and give it a name that conflicts, or run your own server with
similar consequences.

Some things are different:

> * Multiple versions for a given package-name (I'll call them
> package-versions in this email) can be installed, and there is some version
> selection mechanism.

This is no longer true, and this is where the new system differs from
other systems such as 'npm' as well. I think this was, in fact, a
major _problem_ with Planet, and one that I'm personally glad the new
system doesn't have.  I think there are a few problems with what
Planet did:

1. Many packages can't really work with multiple versions in the same
program, whether because of generative structs, or files used, or many
other issues.

2. The way Planet versions were selected was mostly hardwired, and
everything was hard to upgrade.

There were other problems, too, but these are the ones I remember
being most significant.

However, I _think_ (the documentation is pretty terse) that the
`--scope-dir` option to `raco pkg install` can help you simulate a
more npm-like workflow.

> I was expecting to use this original PLaneT as a starting point, and evolve
> it in ways like the following...
>
> * In addition to the "(planet ...)" require-specs, package-versions also can
> come from "http:", "https:", and "git:" URLs.  ("github:" would also be OK.)
> Each such URL would identify trees or a tarball.  Then we see how people
> choose the PLaneT server vs. HTTP vs. Git over time.

Package dependencies can be specified with URLs, which can specify
remote directories, or remote ZIP (or tar etc) files, or local files
or directories, or GitHub repositories.  I hope to add support for
arbitrary git remotes soon (if the `git` binary is available).

> * Maybe improve the version-selection and compatibility support.
> Investigate whether it's worthwhile to separate out the
> backward-compatibility information from the static package-version
> distribution (and especially from the version number), or whether in
> practice there are simpler ways that are satisfactory.

I think decoupling the source code from the version specification is
exactly the improvement wanted here.

> * Maybe a facility in "info.rkt" to provide aliases for require specs.
> Otherwise, people writing nontrivial multi-file code that uses other
> packages from PLaneT/whatever end up having to make wrapper modules so that
> we don't goof our require-specs and accidentally depend on multiple
> package-versions for the same package-name.  Note that, with URLs, these
> aliases *might* be the only actual package-name construct in the HTTP/Git
> system as distinct from URL similarities of package-versions.  This info
> might be implicit in a package-version's "info.rkt"'s reference to a
> previous package-version, perhaps coming from an assertion of compatibility
> info.  This might be simpler than it might sound, but it has some
> interesting implications, including for forking and web-of-trust.)

Again, this is addressed by no longer having Planet's treatment of
versions and require specs.

> * Simple web-of-trust package-version public-key signing of package-versions
> (e.g., URLs plus hashes of contents), to start with, perhaps initially with
> only centralized repository for signatures.  Soon build distributed
> web-of-trust, plus multiple repositories so organizations have option to
> keep their signatures separate.  Build mechanisms atop that, including
> advancing the state of the art.

I think package signing is something we'll eventually need, but I also
don't think it's on the critical path.  It's also possible to add to
the package system in the future.


> * Automate and simplify releasing in general.  With PLaneT, it's been
> not-unusual for even core Racket developers to avoid releasing some add-on
> code to PLaneT, perhaps because the clerical stuff was a headache.  For the
> old PLaneT, I was simplifying this with McFly, but with new a package
> mechanism, I would start with that and then ask what clerical parts still
> need help.  (For example, if doing development in an SCM repository that's
> accessed directly via require-specs, then releasing a package-version might
> consist mainly of adding a tag/label.  planet-lang.org's directory might
> even update automatically, given info about a previously-released
> package-version of the same package-name.)

The new system is much much much easier to release packages for.  For
starters, this is a working package:

    https://github.com/samth/add-blaster

You can install it with `raco pkg install
github://github.com/samth/add-blaster/master` (I hope we can make that
command line shorter).

Second, you can add this to `pkg.racket-lang.org` with a few clicks.
This can be automated once pkg.racket-lang.org has an API. Further, it
automatically updates when the GitHub repository changes.

> * Use submodule support to support single-file packages, at least for the
> HTTP/Git package-versions.  "(module+ info ...)".  It seems from Emacs
> history that some people really like the single-file module, it lowers
> barriers, and now submodules give us an easy way to finally do it.

This is potentially an interesting idea, although the info.rkt file
format is quite restricted to enable reading it without running
arbitrary code, and this might be hard to integrate with submodules.

> * Do whatever is necessary to avoid blocking the program for
> few/several-minutes while documentation is reformatted, when requiring an
> uncached package-version.  Maybe even moving it to an async process that's
> run when idle (Unix "nice"?) would work.

First, the split between documentation and code for many packages will
help here. Second, I think this is a lower-level issue than the
package system -- in-tree builds have this issue too.

Also, what program is being blocked here?  The installation process? DrRacket?

> * To put it vaguely: keep things simple in most cases, but don't dumb-down
> in practically restrictive ways, and keep an eye out for places to
> experiment with potential big wins for immediate practice or research.  Some
> things I just mentioned above would surely need refinement/exclusion based
> on this principle.  For another example, I heard some comments at one point
> about a package name being an interface, and multiple sources being able to
> provide implementations matching that interface.  I don't know the current
> plans for that, but I wouldn't make any special mechanism for that.  For
> another example, don't try to dumb-down package-names, as if the first
> person to make a package concerning the generic concept "foo" has the
> be-all-end-all package for all things "foo".

Package names aren't interfaces, but modules names (like `data/list`)
can be thought of that way, and multiple packages can provide the same
one.  However, this kind of overlap is discouraged.

I'm not sure what would be a less 'dumbed-down' but still flat
namespace. Do you have example suggestions?  The flat namespace has
worked well with other languages with huge numbers of packages.

> * Some Web directory of software on "planet-lang.org" (with JSON dump and
> maybe query), which includes both PLaneT packages together with the HTTP/Git
> packages that people have chosen to list in the directory.  It's a "this is
> all the Racket packages we know about, and probably easier to find via
> search here than via Google."  (Eventually, this would be hooked up to the
> site-wide search feature for "planet-lang.org", together with other
> categories of other searchable Racket-related info that we identify.  Then
> DrRacket search could be hooked up to that.)

Currently, `pkg.racket-lang.org` doesn't include the contents of
`planet-compat.racket-lang.org`, but I suppose it could. It's already
searchable.

I hope that's helpful for clarifying some of the current design, and
it's great to have feedback from a heavy user of Planet.

Sam

Posted on the dev mailing list.