[racket-dev] Packages

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Apr 8 18:17:42 EDT 2013

50 minutes ago, Jay McCarthy wrote:
> On Mon, Apr 8, 2013 at 2:18 PM, Eli Barzilay <eli at barzilay.org> wrote:
> > It all starts at how someone is expected to approach developing a
> > "package", regardless of how this is defined in the current
> > system.  A quick way to get to the core of the problem is to
> > consider what you'd expect to do when you start working on some
> > "foo" package.  Under probably most package systems and most
> > distribution systems, you'd begin with a "foo" directory and work
> > there.  But with the new package system, you get an extra level of
> > directory structure: you need to make up a "foo" directory with a
> > "foo" subdirectory (usually).
> 
> I very much agree with you. When you are going to create the "foo"
> package, you first create a directory for "foo":
> 
> mkdir foo
> cd foo
> 
> Then once inside this directory, you put what the "foo" package will
> contain. If it will contain the "bar" collection, then you create
> that:
> 
> mkdir bar
> 
> and populate it.

This not-so-subtle game of definitions is exactly the problem.  When I
create "foo", I want it to correspond to requiring `foo/*' files.
IOW, I do not think about the option of having a "foo" collection
directory in a "foo" package directory.  And in fact, in all of Racket
modulo the package system, the concept of a collection root directory
is used just as a device to look for collections.

But note that I do *not* reject the ability to have a package with
multiple collections.  I'm just saying that this is by far the less
popular use case.


> I don't see how you can start from this place and say, "I am making
> the elis-awesome-stuff package, so therefore I should *not* create
> the elis-awesome-stuff directory". It feels contradictory.

If I *had* published my awesome stuff as a single big package, then
you're right: this would be a fine use.  And as I said, this is
exactly what the "this-and-that" and "mischief" packages do.  But in
my case -- and I suspect in many other cases -- I don't have an
overall awesome directory of things to publish: I have a pile of
things where *some* of them are publishable.  So I'd like to be able
to publish it quickly with

    raco pkg create ~/my-stuff/elibot

and be done with it.  I'll still maintain a single "my-stuff"
repository, and I'll still have ~/my-stuff as the only linked (with a
"--root") collection directory.


> > I know that the way I develop packages (again, not using the
> > formal definition of the package system) does *not* do that.  I
> > also know that many other people don't do that.  In fact, I can
> > think of very few *existing* examples that would benefit from such
> > a thing.  This is in contrast to Jay's view, who said that he used
> > multiple-collection per package because he wanted it to "match
> > what users have".
> 
> This comment was specifically in the context of this package system.
> Users of packages get collection roots, so developers of packages
> should have them too. That was my point, not that it corresponded to
> what users already had.

I don't see why this is necessary.  What users have can be more
complicated since they use the pkg command to install it.  So it's
fine for them to have the package installed in a new root even when
the publisher doesn't have the same.

Of course it would be nice if the package wouldn't get installed in
its own root (to make it easier to look at the code and contribute to
it), but I don't know if that's easy to do.  (It's certainly doable,
of course, since that's the original use of "raco link".)


> I agree that the one collection per package is common. However, I
> value highly the ability to install into existing collections and
> the ability to separate the name of a package and the name of the
> collection,

Note that I didn't say that these features should be dumped.  Using
existing collections is obviously a needed feature, and having a
separate package name from the collection name is fine too[*].


> because I believe having this sort of external linking makes it is
> easier to replace and evolve things into the future. A major problem
> with Planet, in my opinion, was that code mentioned the name of the
> package and so it was difficult to replace the provider.  By having
> the name of the package not in the code, this problem goes away.

I don't follow this point.  But I'm vaguely guessing that this why you
want to separate the single collection name from the package name.
And BTW, this separation would still be useful now, since I might work
on "foo1" and "foo2" package directories that are both variants of a
single package.


> > * The package = multiple-collections feature is bad
> 
> This has been a fundamental part of the design goals since the first
> conversation I was part of in Chicago 2010.

The ability to have multiple collections was certainly there, and it
is certainly useful.  It's the *equality* that is a problem, the fact
that even in the (very common) case of a single collection you need to
be aware of multiple collections.


> >   * The handin-server and -client should clearly be developed
> >     together -- but they should not be distributed together, since the
> >     only the latter is what students need.  The best way that I can
> >     think of to address this is still bad: make them into a single
> >     package, and add instructions on packaging just the client to
> >     students.  It's true that such instructions already exist -- but
> >     there is no reason to complicate these instructions.
> 
> They can still be developed together:
> 
> + handin.git
> -- handin-server.pkg
> *** handin-server code
> -- handin-client.pkg
> *** handin-client.code
> 
> Then when you inform the PNR about the "handin-server" package you
> use the path specification to name the appropriate package of the
> single coherent git repository.

Not if you want the people who get the server package to get also a
ready-to-configure directory with the client code.  (Since the raw
client is useless.)  So it would probably be more like this:

    handin/
    handin/handin-server/
    handin/handin-server/...server-code...
    handin/handin-client/
    handin/handin-client/...client-code...

with not-so-easy instructions on how to turn the latter into a package
by the intended audience, or do a template to make it a little more
easy for these users:

    handin/
    handin/handin-server/
    handin/handin-server/...server-code...
    handin/handin-client-root/
    handin/handin-client-root/handin-client/
    handin/handin-client-root/handin-client/...client-code...

and tell them to run the create command on the contained root --
resulting in easier instructions for them, but a complex directory
structure and linking to do when you work on the code.


> I do not find it compelling that tool A's default is not tool B's
> default.

It's an API -- uniformity is always desirable.  IMO, this is no
different than talking about argument order to the list/* functions.


> > But the problems are not only at a techincal level, thery're also
> > higher up.  Making collection roots into the unit of distribution
> > means that people need to be aware of them.  In fact, this is
> > actually making a "collection root" into a new concept -- before
> > the package system it was just a place to look for toplevel
> > collections, but now it has turned into sometimes a place for
> > collections, and sometimes a container of multi-collection
> > packages (as well as such a place).
> 
> A collection root only has one meaning: the place to look for
> top-level collections.

Certainly not!  It *used* to have just that meaning, but now if I want
to publish packages, then I need to create artificial roots for them.
For running Racket code they obviously are just search entry point,
but they now have an additional meaning as publishing entry points.
That's the equation that I want to get rid of.


> > This is a relatively minor change, but I think that conceptually it
> > greatly simplifies things.
> 
> I cannot tell if you are talking about the change to "raco pkg create"
> or the change to get rid of collection root packages in general.

The first.  I definitely want to keep the ability to have multiple
collections in a dedicated root.

> If you are referring to the second, I disagree because while
> "simplifying" things, it makes other things (like installing into
> existing collections) impossible. That's not a fair trade.

(As a sidenote: this is unrelated to installing into existing
collections, since even with a single collection package, I should be
able to do this

    raco pkg create ~/my-stuff/data/eli-tree

which would result in a collection that would get spliced as usual as
`data/eli-tree'.  So the real reason to keep the multiple collection
is just the need to have them, unrelated to existing collections.)


> > One of the main problems I had with planet is that it was too
> > heavy for random users.  The new system is certainly lighter, but
> > I think that such a change will make it significantly more usable
> > in that it's much closer to "just dump your bunch of files on the
> > web".
> 
> I wish I understood why
> 
> mkdir bar
> scp -r bar public_html:
> 
> is simpler than
> 
> mkdir -p foo/bar
> scp -r foo public_html:
> 
> The current system is *just dump your bunch of files on the web*
> after /picking a name for your package/. I don't understand why you
> wouldn't expect that you have to pick a name.
> 
> I really do wish I understood why. Maybe you could email me
> personally with the directory structure you find so difficult to
> adapt into this, so I could appreciate your problem more directly.

That "elibot" is a real example.  To repeat, I want this to work:

    raco pkg create ~/my-stuff/elibot

The "my-stuff" is my repository with all kind of stuff in.  With the
current system, if I want to publish it, I need to move the code into
"~/my-stuff/elibot/elibot", change scripts that use it accordingly
(since this is my irc bot's source, so it's code that is used from
that directory), and I need to do this change in my repository (which
means that it's requires a complicated git filter-tree, or starting a
new repo which would be a submodule (in the git sense) of my usual
repo).

Alternatively, I could have a dedicated package root directory which
I'm not really using, with a symlink to the actual collection
directory, which would be used only for publishing.  Note that this
shows why a collection root has an additional meaning: I'm now talking
about two roots: one which is how my code is available, and another
which is used only for publishing.

Either way, this *is* more work, which is done only to publish the
package.  My suggestion makes it unnecessary.  It basically makes it
work in the same way that I'd end up with, using a wrapper script for
"pkg create" -- which would create the for-publishing directory.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.