[racket-dev] proposal: `data' collection

From: Eli Barzilay (eli at barzilay.org)
Date: Fri Jul 2 13:50:04 EDT 2010

On Jul  2, Matthias Felleisen wrote:
> 1. Size matters even if it doesn't really matter. Seeing these
>    numbers makes it clear that our download will be called a
>    behemoth and an ms style colossus. We all know that in the end,
>    this number is irrelevant. PLT comes with a lot of extra stuff
>    and that stuff is useful. But among the hacker crowd this number
>    can turn things against us and they influence a lot of the rest
>    of the world.

Well, these things do come with extra stuff too -- except that they're
split into individual packages (in the OS package sense, an RPM in my
case).  Racket is pretty much unsplittable ATM.


> 2. And yes, even more importantly, the connectivity graph among
>    collects may concern us. Why however? Collects aren't modules and
>    it is easily possible to work in parallel on interconnected
>    collects.

Because code maintenance become much more difficult when things are
very interconnected?  Your collection loses a clear upstream and
downstream side -- usually, when your code runs into a problem it's
either a problem of you not doing what you should (your downstream
clients complain), or you're not getting what you should (the upstream
libraries you use are broken).  (And contract blames are playing on
just this.)  But if my collection has a problem with yours, and they
both depend on each other, then there's no clear interface provider
and consumer -- it's true that at the actual module level there are no
cycles, but since your code is in a single collection, there *is* a
strong dependency among the modules inside it, so there is really no
clear consumer and provider in this case.  (IIRC, this was (or is?)
the problem with the stepper and drscheme having a dependency in the
wrong direction.)


> I think this really gets at the questions, 
> 
>   what is the purpose of a collect? 
> 
> Even if we ignore the distribution idea, it should concern us that
> we don't have a concise answer for that. Even Java seems to have
> one. Why can't we?

(+1, and +1.)


> 3. I still do not understand what Eli calls a package. 
>   -- Is it more than a module and less than a collect? 
>   -- Is it a bunch of collects? 
>   -- Is it something you want to distribute? 

How about this: a package is a bunch of code (= modules) *with* a
clear (or well defined) purpose, that does not form cycles with any
other package. --?

This could be trivially reduced to each module being defined as a
package -- but the purpose is a key feature here.  It should forbid
considering a "private" module as a package on its own, and a bunch of
modules that are all implementing some given system (like "racklog" or
"htdp") should all be considered a single package.

This is where it gets kind of fuzzy, so maybe it will help to think of
it as a kind of a declaration: I'm telling my clients (authors of code
that uses my "package") that I'll never use their code, and I'm
telling my providers that they should never use my code.  This is
obviously dynamic -- it might be that some of the functionality that
I'm providing is useful enough that it should move up to be part of
one of my providers (or considered into its own new package that
becomes a provider); and it might be that one of my consumers is
writing code that would make my life easier.  In both of these cases,
I think that the *proper* way to tackle the changes is to move code
between packages (even if it keeps the same owner) -- *not* to create
the connections and leave the code where it is.

And to try a concrete example: Swindle has that `echo' thing, that
just might be so great (it's not) that we'll want it in the core.
Doing this is easy: just add (require (only-in swindle/misc echo))
into "racket/main.rkt" and your done.  But this means that now the
`swindle' collection is part of the `racket' collection in the sense
that you cannot install the latter without the former, and that's a
whole bunch of code that you didn't want in `racket'.  And if we're in
the happy stage where we have a small distribution with additional
packages that people choose from -- then we need to choose whether to
silently make the `racket' collection bigger, or force people to get
the swindle package because it's needed to resolve dependencies.
Things become way better if you just take the `echo' code itself, and
move it into `racket', so no inter-collection (actually inter-package)
dependencies change.  Even if I'm still its main maintainer, you can
fix a bug or extend it or change it -- and there's no problem because
it is the `racket' package that you're maintaining too; whereas if the
code stays where it is, then you're more likely to ask me to change
it, which means that inside the swindle code itself I need to wear two
hats depending on which lines I change.  Or say that we had version
numbers for packages -- I could keep incrementing the swindle version
whenever I wanted to, but if the `echo' code stays, it means that an
increment to swindle affects the racket collection.

[Hopefully, it's clear why moving the code is much better than keeping
it where it is -- but there's obviously a cost involved in the move
itself.  It will use a different language now, it will be documented
differently (in the `echo' case, very), tests will need to move, and
the code is likely to be overall revised and reevaluated, and very
likely modified, possibly even in a way that I (as the swindle author)
will not want.  Since you desperately want it, and I'm the one who
wrote it, this whole work will need to be done by one of us.  Since
we're both busy with other things, it would be temptingly easy to just
defer it for later -- just add that `require' in and be done with it.
IMO, the above problems are real, which makes this easy-way-out
solution an offence.  As things stand, nobody will see it since
they're distributed together anyway -- but when we run into the above
problems and when they get to the point that they *require* a solution
(eg, swindle gets too broken and is dropped, its copyright changes,
its author moves to tibet and becomes a monk), *someone* will need to
step up and solve them.  That someone will go over the code and move
it, deal with the documentation, with the tests, fix bugs, and of
course wash the windows and scrub all the pots that were left after
the cooking for last nights party.  (And that gets to why I dislike
unstable, even if someone else will do that laundry.)]

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.