[racket-dev] Packages

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Mon Apr 8 16:41:48 EDT 2013

I stand by my recommendation from December:


That is, I think this suggestion should be phrased as a patch.

As implied in my quote below, I tried something much like you're
describing, and I was unhappy with the resulting complications. Maybe I
implemented the idea wrong, and maybe you'll come with something better
--- all the more reason to phrase your suggestion as a patch.

At Mon, 8 Apr 2013 16:18:18 -0400, Eli Barzilay wrote:
> This is a (long) criticism of the current state of the package system.
> (It is a by-product of PR13669, where I raised that point.)
> Executive summary: I very strongly think that "pkg create" should
> change.  See the bottom for my suggestion.
> * Most code development happens in a single collection
> It all starts at how someone is expected to approach developing a
> "package", regardless of how this is defined in the current system.  A
> quick way to get to the core of the problem is to consider what you'd
> expect to do when you start working on some "foo" package.  Under
> probably most package systems and most distribution systems, you'd
> begin with a "foo" directory and work there.  But with the new package
> system, you get an extra level of directory structure: you need to
> make up a "foo" directory with a "foo" subdirectory (usually).
> I know that the way I develop packages (again, not using the formal
> definition of the package system) does *not* do that.  I also know
> that many other people don't do that.  In fact, I can think of very
> few *existing* examples that would benefit from such a thing.  This is
> in contrast to Jay's view, who said that he used multiple-collection
> per package because he wanted it to "match what users have".
> So just to make sure that I'm not talking out of my ass, I looked at
> all of the existing packages.  Here are some numbers (manual count, so
> it's all estimated):
>   42 Single-directory packages, holding at most some meta stuff like a
>      README file at the top (IIRC, there was only one case with an
>      info.rkt file too).  Out of these, about 8 use an existing
>      collection name like "data", "file", or "net".
>   19 Multi-directory packages.
> This makes it look like there's a good case for multi-collection
> packages, but:
>   14 of these multi-collection things have a single collection and a
>      "tests" one.  As discussed in the past several times, the general
>      agreement was that it's better to have tests inside the relevant
>      collection (and the future trend is likely to shift to have tests
>      *in* the source code).
> So the real numbers are 56 single-collection packages, and 5 multi
> ones.  Of these multi-collection packages, one was exactly the case I
> thought that would benefit from this: Carl's "mischief", which is
> declared as "a bunch of stuff" and arguably would benefit from
> splitting into proper packages should there be sufficient demand.  In
> the same category, there is soegaard's even more explicitlty named
> "this-and-that".
> This leaves exactly 3 cases where the multi-collection is really used
> -- and two of them are Jay's packages (the other is from dvanhorn).
> I take this as reaffirming my guess that pretty much all developement
> happens in a single collection.  I'll note that there is, however, a
> point for using existing collection names -- not a strong one, but
> the ~20% (8 of the first 42) number of packages that use an existing
> collection name was roughly the same in the multi-packages too.
> * The package = multiple-collections feature is bad
> Given the above, one way in which the multiple-collections per package
> is bad is obvious: it's yet another complication on the way of a
> random hacker's to contribute code.  It means that developers need to
> unnaturally move code into a subdirectory, including existing code in
> repositories.  That's a *real* problem in some cases.  Two quick
> examples:
>   * Like many other people, I have my "directory of stuff", with
>     random code and random collections.  If I make that into a
>     package, then any later publishing of some part of it as its own
>     package means that I need to shuffle files around.  With an
>     existing repository, and especially if I want to maintain my
>     revision history, this leads to yet more acrobatics than a quick
>     move to a subdirectory.
>   * The handin-server and -client should clearly be developed
>     together -- but they should not be distributed together, since the
>     only the latter is what students need.  The best way that I can
>     think of to address this is still bad: make them into a single
>     package, and add instructions on packaging just the client to
>     students.  It's true that such instructions already exist -- but
>     there is no reason to complicate these instructions.
> Another way to see why multiple collections per package are bad is to
> consider the "raco link" command.  This command takes the collection
> *names*, and originally this was the only thing it did.  Only after
> Matthew implemented it, he added the `--root' option -- and he did
> that after a request that I did, with the explicit scenario in mind of
> accommodating such a "directory of stuff".  The package system, as
> currently implemented, takes this non-default `--root' flag, and
> adopts its behavior as the default.
> But the problems are not only at a techincal level, thery're also
> higher up.  Making collection roots into the unit of distribution
> means that people need to be aware of them.  In fact, this is actually
> making a "collection root" into a new concept -- before the package
> system it was just a place to look for toplevel collections, but now
> it has turned into sometimes a place for collections, and sometimes a
> container of multi-collection packages (as well as such a place).
> * What can be done
> Just to be clear, I completely agree that it would be insane at this
> point to do some kind of an incompatible change.  But looking at the
> list of "raco pkg" subcommands, there's one command ("install") that
> deals with a package URL, several commands that deal with the name of
> an installed package, and one command -- "create" -- that deals with
> these "package directories".  So if just this commad is revisited, the
> issue can be resolved.
> I originally thought that it makes sense to either have a new command
> that packages specified collection directories (or names) instead of
> collection roots.  It's a small change: you just name the
> collection(s) instead of naming a root that has the collections as
> subdirectories.
> Jay suggested "packagify", which was actualkly a good hint for me to
> do this writeup: I thought about what exactly bothers me about having
> such a weird name -- and the thing is that I think that it's this
> command that should be the more popular one, so a weird name for it
> would not be a good choice.  The next obvious thing to consider is a
> better name -- something like "pack" -- and the problem with that is
> that it will be very confusing for users to have both "pack" and
> "create" with these subtle differences.
> The next name that I considered was something like "pack-collection",
> or even possibly something like "pack-collection" and rename "create"
> as "pack-collection-root".  But this is bad also for exactly the
> reason that Matthew said in the PR, which I think is a very good
> guideline for this suggetion and for the overall design of the package
> system:
> | I thought it would be a simpler path for people who already
> | understand collections, but it turned out to be more complex and
> | more confusing to have more ways of doing things.
> So the problem of having two "pack-" or "create-" variants is that
> people should still be aware of the two things, and more specifically,
> the concept of a "collection root directory" (or whatever it gets
> called) doesn't go away.
> Together with "raco link", I now think that the package system (or
> specifically, the "create" command) should do exactly what it does:
> the default would accept a collection directory and make it into a
> package, and with a "--root" flag, it would package up the whole
> specified collection root.
> There are a few technical details to deal with.  The few that I see
> are:
> * What happens when there is more than one collection specified in a
>   single "create" command.  Following the above analysis of existing
>   packages, I think that it makes sense to have the "main" collection
>   be the first, and optionally further "support" collection specified
>   later -- which means that the package meta-data is taken from the
>   first collection.  The reason this follows what I see now is that
>   most cases of two directories had a "tests" directory, and it makes
>   sense to do something like
>     raco pkg create path/to/foo path/to/tests/foo
>   and given that "path/to" is one of my roots, the "create" command
>   will package the two collections of "foo" and "tests/foo".
> * Another question is what happens when I specify a collection that is
>   not a toplevel collection.  The way that this can be done is what I
>   wrote above: track it to its root, and use that as the path to the
>   collection in the package.
> * Finally, there's the question of manually packaged directories and
>   single-collection repositories.  I think that both of these cases
>   should be dealt with in a similar way -- when you create a package
>   URL, you also specify whether it is a "--root" more or not, with the
>   default being off.  Existing URLs will be treated as being in
>   "--root" mode so they all continue to work fine.
>   Alternatively, this could be specified in the package's toplevel
>   info.rkt file, which "pkg create" would check, but with a default of
>   non-"--root" this means changing existing repositories.
> This is a relatively minor change, but I think that conceptually it
> greatly simplifies things.  One of the main problems I had with planet
> is that it was too heavy for random users.  The new system is
> certainly lighter, but I think that such a change will make it
> significantly more usable in that it's much closer to "just dump your
> bunch of files on the web".
> -- 
>           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                     http://barzilay.org/                   Maze is Life!
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev

Posted on the dev mailing list.