[racket-dev] Packaging

From: Eli Barzilay (eli at barzilay.org)
Date: Sat Feb 19 16:14:59 EST 2011

IMO, the first three sections look good, but there are some lessons
missing, which can lead to similar kinds of mistakes.  I'll try to
describe the missing principles below, together with other issues etc.
I'm going to avoid my usual tendency to write an itemized list,
hopefully this will not make things too confusing.  (I didn't read all
of the replies yet, so some of this might be repeating stuff.)

The first thing that bugged me are the global identifiers.  (And I
think that the "libgtk2" part is just a byproduct of this.)  IIUC,
these identifiers will translate to toplevel directories, which means
that these toplevel names become precious realestate.  Obviously there
should be a way to identify a package (a url that it was downloaded
from), and obviously there should be an error if two packages try to
install the same file.  But lumping the two together seems like a
mistake.  IIRC, we had a case example for this -- data/functional
which would be a spearate package from data.

One major problem that planet tries to tackle (probably its main
feature) is versions.  It started in a way that sounded promising --
avoid "DLL hell" -- something that many people suffer from.  But the
result made up some of planet's obvious problems too.  The versioning
system makes things very complicated and confusing; tracking
dependencies became a chore -- something you can easily end up
fighting with instead of thanking it; and in general the threshold for
writing packages becomes very high to the point that it prevents
contributions; it does solve the "DLL hell" problem, but brings in
problems that are bad in a very similar way (eg, when you end up
having three copies of schemeunit around).  IMO it looks like it *had*
the desired effect of making things stable, but the cost is just too
high, with some packages that have stagnated, and some that were never
turned into planet packages.

So the first thing that strikes me as a bad idea is trying to fight
with versions in a way that can lead to the same problems.  You end up
with an elaborate system that allows multiple versions and that
inevitably leads to some of the same problems.  For example, the
".links" directory complicates things to the point where you're forced
to use a tool to know where bindings are coming from.  This is
probably a good point to list some principles:

* It should be *simple* to maintain code, and code management should
  be robust.
  - Adding an extra layer of indirection at the point of requiring
    modules is dangerous for both of these aspects.  It makes it hard
    to maintain code without tools (your "I think the biggest
    problem") paragraph, and in introduces another layer that can
    break (we'd be dealing with module requires that are translated to
    link files and plain requires that are in turn still translated to
    the built in primitive `#%require').
  - Actually, since the `require' -> `#%require' is already there, and
    it already makes it possible to extend things inside racket, I'd
    like to see it extended rather than building something separate on
    top.
  - (I view Jos's request for supporting "simple users" as part of
    this.  People who don't care for multiple versions should find
    things almost as simple as they are today.)

Re your ".links" suggestion (sec 4.3), it sounds pretty bad regardless
of the above.  IIUC, there will be a file for each module that your
package is using -- so if there's about a 100 common modules, they're
going to exist as link files in almost all packages, which quickly
makes the whole thing explode.

* Minor principle: no relying on filesystem conventions and/or
  features that are not "nearly universal".  You made the point that
  this avoids symlinks (which IMO is not an advantage but a
  requirement), and in a similar way ".links" is a bad name.

Going back to your list of features -- one such feature is being able
to "freeze" a specific package.  Is this *really* necessary?  It looks
to me like exactly the same kind of fancy features that planet
implements and is almost never used.  To be more specific: if I care
about a specific web server feature because I have it running on some
server, then I should be an obvious client for this feature, but I
just know that when I'll be in that situation I'll do the same thing I
do now: have a fixed full installation that I won't touch.  (Note that
we have exactly this situation with the brown server, so replace "I"
with a collective "we" there.)  After all, it seems nice to be able to
upgrade parts without changing the web server version for stability of
my scripts -- but I wouldn't trust things to keep working with the
same web server version if the core version is different.

To continue with your text, one thing that strikes me as a (negative)
result is the amount of effort that dealing with versions leads to --
it's probably bigger than planet, and that took a bunch of time to
stabilize.  We have a huge number of little link files, complicated
rules to track where bindings are coming from, nightmare
implementation details that make me twitch (this was seriously my
first reaction when I saw "name mangling").

* Implicit name mangling is *bad*.

Even at this meta-design level, I already see hacks to deal with
things -- like the "strawman implementation" bit with a single heap.
IIUC, your heaps correspond to places where packages can be installed
-- one is the main installation, one is the user specific one, and
other likely heaps are "site-racket" kind of directories, and
/usr/local ones.  So this implementation of copying everything to a
single heap means that it must live at the most specific directory
level, which is my home directory.  And of course that's not specific
enough since I'm using my home directory on multiple machines so I
expect requests to make this heap depend on the platform and the
racket core version, and obviously some $RACKETHEAPDIR variable.  And
now I have some ~/.racket/heap/<version>/<arch>/<blah> directory with
files that are copies of other files, or files that are shadow stubs
for other files and I get to the happy point where I really need a GPS
to navigate my sources.

So I completely agree with your conclusion that this is bad, but I
think that I view it much more negatively than you.  But the thing is
that I don't have a good solution instead -- and together with my
dislike of dealing with multiple versions in anyway (which you seem to
agree with, at least partially, when you say that the common case is
one version of each package), then my conclusion is to make things
simple again: avoid trying to solve the versioning problem.

Besides the obvious problems that you mention in the current state of
the core (huge, people get lots of stuff that is irrelevant for them,
etc etc -- problems that I agree with strongly enough that I have a
sore virtual throat from years of screaming about it), there is one
big advantage for it.  And that advantage is that there is exactly one
version for each package.  As a user, I get a new installer and
everything is updated.  So I'd like to see a system that preserves
this simplicity+robustness combination -- which means that there is
still one version of each "core package" in the plt repo, but the
distribution becomes, uh, distributed.

Re distribution and URLs etc, I have a bunch of comments on that too,
but I'll mostly defer to what I suggested in the past re a
require-url.  One quick point: there's no need for a new ".rkb"
extension.  I obviously love the use of "racket ball" (I wanted to use
it long ago), but it's fine to use it as a convention for the actual
transport that gets used -- some .tgz or .zip file, or some url that
has the directory tree.  (A point to consider here is that some parts
will inevitably be packaged in foreign formats -- rpms, debs, msis or
whatever.)  The same holds for the schema part of the url -- I think
that it would be a mistake to rely on HTTP for anything concrete (like
specifying redirections) or on HTTP at all (like allowing git://
urls).

A specific result of keeping things flexible like that is that there's
no need to do anything about a central repository (your section 6)
right now -- it'll just be flexible to design and implement that part
later.  More importantly, it will allow external subgroups to form
their own conventions and infrastructure for their packages, so this
part can be done based on existing lessons instead of investing
precious efforts on an implementation of some review/reputation system
if it turns out that (for example) github implements something
extremely convenient and open with a nice api...  The same goes for
dealing with security -- could be left out for now, and easy to add in
later or to adopt some popular solution that formes by itself.

It's true that the "informing" point (sec 7) is important, but this
can (and IMO should) start with just some knowledge of the plt
packages that will be selected in a post-core-installation dialog or
command-line, and later expanded to accomodate more places (which
should allow code to finally drift back out of the core plt repo).  So
I think that a much healthier approach is to start with revising
packages, then breaking the core to pieces, and only then get to the
rest of the cake.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.