[racket-dev] Packages

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Mon Apr 8 17:29:02 EDT 2013

On Mon, Apr 8, 2013 at 2:18 PM, Eli Barzilay <eli at barzilay.org> wrote:
> This is a (long) criticism of the current state of the package system.
> (It is a by-product of PR13669, where I raised that point.)
> Executive summary: I very strongly think that "pkg create" should
> change.  See the bottom for my suggestion.
> * Most code development happens in a single collection
> It all starts at how someone is expected to approach developing a
> "package", regardless of how this is defined in the current system.  A
> quick way to get to the core of the problem is to consider what you'd
> expect to do when you start working on some "foo" package.  Under
> probably most package systems and most distribution systems, you'd
> begin with a "foo" directory and work there.  But with the new package
> system, you get an extra level of directory structure: you need to
> make up a "foo" directory with a "foo" subdirectory (usually).

I very much agree with you. When you are going to create the "foo"
package, you first create a directory for "foo":

mkdir foo
cd foo

Then once inside this directory, you put what the "foo" package will
contain. If it will contain the "bar" collection, then you create

mkdir bar

and populate it.

I start from the same place you do "You'd begin with a "foo" directory
and work there", because we are making the "foo" package.

I don't see how you can start from this place and say, "I am making
the elis-awesome-stuff package, so therefore I should *not* create the
elis-awesome-stuff directory". It feels contradictory.

> I know that the way I develop packages (again, not using the formal
> definition of the package system) does *not* do that.  I also know
> that many other people don't do that.  In fact, I can think of very
> few *existing* examples that would benefit from such a thing.  This is
> in contrast to Jay's view, who said that he used multiple-collection
> per package because he wanted it to "match what users have".

This comment was specifically in the context of this package system.
Users of packages get collection roots, so developers of packages
should have them too. That was my point, not that it corresponded to
what users already had.

> So just to make sure that I'm not talking out of my ass, I looked at
> all of the existing packages.  Here are some numbers (manual count, so
> it's all estimated):
>   42 Single-directory packages, holding at most some meta stuff like a
>      README file at the top (IIRC, there was only one case with an
>      info.rkt file too).  Out of these, about 8 use an existing
>      collection name like "data", "file", or "net".
>   19 Multi-directory packages.
> This makes it look like there's a good case for multi-collection
> packages, but:
>   14 of these multi-collection things have a single collection and a
>      "tests" one.  As discussed in the past several times, the general
>      agreement was that it's better to have tests inside the relevant
>      collection (and the future trend is likely to shift to have tests
>      *in* the source code).
> So the real numbers are 56 single-collection packages, and 5 multi
> ones.  Of these multi-collection packages, one was exactly the case I
> thought that would benefit from this: Carl's "mischief", which is
> declared as "a bunch of stuff" and arguably would benefit from
> splitting into proper packages should there be sufficient demand.  In
> the same category, there is soegaard's even more explicitlty named
> "this-and-that".
> This leaves exactly 3 cases where the multi-collection is really used
> -- and two of them are Jay's packages (the other is from dvanhorn).
> I take this as reaffirming my guess that pretty much all developement
> happens in a single collection.  I'll note that there is, however, a
> point for using existing collection names -- not a strong one, but
> the ~20% (8 of the first 42) number of packages that use an existing
> collection name was roughly the same in the multi-packages too.

I agree that the one collection per package is common. However, I
value highly the ability to install into existing collections and the
ability to separate the name of a package and the name of the
collection, because I believe having this sort of external linking
makes it is easier to replace and evolve things into the future. A
major problem with Planet, in my opinion, was that code mentioned the
name of the package and so it was difficult to replace the provider.
By having the name of the package not in the code, this problem goes

> * The package = multiple-collections feature is bad

This has been a fundamental part of the design goals since the first
conversation I was part of in Chicago 2010.

> Given the above, one way in which the multiple-collections per package
> is bad is obvious: it's yet another complication on the way of a
> random hacker's to contribute code.  It means that developers need to
> unnaturally move code into a subdirectory, including existing code in
> repositories.  That's a *real* problem in some cases.  Two quick
> examples:
>   * Like many other people, I have my "directory of stuff", with
>     random code and random collections.  If I make that into a
>     package, then any later publishing of some part of it as its own
>     package means that I need to shuffle files around.  With an
>     existing repository, and especially if I want to maintain my
>     revision history, this leads to yet more acrobatics than a quick
>     move to a subdirectory.
>   * The handin-server and -client should clearly be developed
>     together -- but they should not be distributed together, since the
>     only the latter is what students need.  The best way that I can
>     think of to address this is still bad: make them into a single
>     package, and add instructions on packaging just the client to
>     students.  It's true that such instructions already exist -- but
>     there is no reason to complicate these instructions.

They can still be developed together:

+ handin.git
-- handin-server.pkg
*** handin-server code
-- handin-client.pkg
*** handin-client.code

Then when you inform the PNR about the "handin-server" package you use
the path specification to name the appropriate package of the single
coherent git repository.

> Another way to see why multiple collections per package are bad is to
> consider the "raco link" command.  This command takes the collection
> *names*, and originally this was the only thing it did.  Only after
> Matthew implemented it, he added the `--root' option -- and he did
> that after a request that I did, with the explicit scenario in mind of
> accommodating such a "directory of stuff".  The package system, as
> currently implemented, takes this non-default `--root' flag, and
> adopts its behavior as the default.

I do not find it compelling that tool A's default is not tool B's default.

> But the problems are not only at a techincal level, thery're also
> higher up.  Making collection roots into the unit of distribution
> means that people need to be aware of them.  In fact, this is actually
> making a "collection root" into a new concept -- before the package
> system it was just a place to look for toplevel collections, but now
> it has turned into sometimes a place for collections, and sometimes a
> container of multi-collection packages (as well as such a place).

A collection root only has one meaning: the place to look for
top-level collections.

Racket has supported multiple collection roots for a long time, it's
just that they were very inconvenient to use, so most people haven't
used them.

> * What can be done
> Just to be clear, I completely agree that it would be insane at this
> point to do some kind of an incompatible change.  But looking at the
> list of "raco pkg" subcommands, there's one command ("install") that
> deals with a package URL, several commands that deal with the name of
> an installed package, and one command -- "create" -- that deals with
> these "package directories".  So if just this commad is revisited, the
> issue can be resolved.
> I originally thought that it makes sense to either have a new command
> that packages specified collection directories (or names) instead of
> collection roots.  It's a small change: you just name the
> collection(s) instead of naming a root that has the collections as
> subdirectories.
> Jay suggested "packagify", which was actualkly a good hint for me to
> do this writeup: I thought about what exactly bothers me about having
> such a weird name -- and the thing is that I think that it's this
> command that should be the more popular one, so a weird name for it
> would not be a good choice.  The next obvious thing to consider is a
> better name -- something like "pack" -- and the problem with that is
> that it will be very confusing for users to have both "pack" and
> "create" with these subtle differences.
> The next name that I considered was something like "pack-collection",
> or even possibly something like "pack-collection" and rename "create"
> as "pack-collection-root".  But this is bad also for exactly the
> reason that Matthew said in the PR, which I think is a very good
> guideline for this suggetion and for the overall design of the package
> system:
> | I thought it would be a simpler path for people who already
> | understand collections, but it turned out to be more complex and
> | more confusing to have more ways of doing things.
> So the problem of having two "pack-" or "create-" variants is that
> people should still be aware of the two things, and more specifically,
> the concept of a "collection root directory" (or whatever it gets
> called) doesn't go away.
> Together with "raco link", I now think that the package system (or
> specifically, the "create" command) should do exactly what it does:
> the default would accept a collection directory and make it into a
> package, and with a "--root" flag, it would package up the whole
> specified collection root.
> There are a few technical details to deal with.  The few that I see
> are:
> * What happens when there is more than one collection specified in a
>   single "create" command.  Following the above analysis of existing
>   packages, I think that it makes sense to have the "main" collection
>   be the first, and optionally further "support" collection specified
>   later -- which means that the package meta-data is taken from the
>   first collection.  The reason this follows what I see now is that
>   most cases of two directories had a "tests" directory, and it makes
>   sense to do something like
>     raco pkg create path/to/foo path/to/tests/foo
>   and given that "path/to" is one of my roots, the "create" command
>   will package the two collections of "foo" and "tests/foo".
> * Another question is what happens when I specify a collection that is
>   not a toplevel collection.  The way that this can be done is what I
>   wrote above: track it to its root, and use that as the path to the
>   collection in the package.
> * Finally, there's the question of manually packaged directories and
>   single-collection repositories.  I think that both of these cases
>   should be dealt with in a similar way -- when you create a package
>   URL, you also specify whether it is a "--root" more or not, with the
>   default being off.  Existing URLs will be treated as being in
>   "--root" mode so they all continue to work fine.
>   Alternatively, this could be specified in the package's toplevel
>   info.rkt file, which "pkg create" would check, but with a default of
>   non-"--root" this means changing existing repositories.

I agree with Matthew & Matthias regarding patches and this.

> This is a relatively minor change, but I think that conceptually it
> greatly simplifies things.

I cannot tell if you are talking about the change to "raco pkg create"
or the change to get rid of collection root packages in general. If
you are referring to the second, I disagree because while
"simplifying" things, it makes other things (like installing into
existing collections) impossible. That's not a fair trade.

>  One of the main problems I had with planet
> is that it was too heavy for random users.  The new system is
> certainly lighter, but I think that such a change will make it
> significantly more usable in that it's much closer to "just dump your
> bunch of files on the web".

I wish I understood why

mkdir bar
scp -r bar public_html:

is simpler than

mkdir -p foo/bar
scp -r foo public_html:

The current system is *just dump your bunch of files on the web* after
/picking a name for your package/. I don't understand why you wouldn't
expect that you have to pick a name.

I really do wish I understood why. Maybe you could email me personally
with the directory structure you find so difficult to adapt into this,
so I could appreciate your problem more directly.


Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University

"The glory of God is Intelligence" - D&C 93

Posted on the dev mailing list.