[racket-dev] proposal: `data' collection

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Jun 24 14:07:30 EDT 2010

On Jun 24, Matthias Felleisen wrote:
> On Jun 23, 2010, at 5:37 PM, Sam Tobin-Hochstadt wrote:
> 
> > To clarify, I'm proposing that this be a part of the "core"
> 
> I agree with this goal and the name.

[BTW, when I talked about part of the core earlier, the meaning was
the actual `racket' collection -- the area where it's difficult to get
into because you're running into all kinds of circularity problems.
IIUC, Sam's meaning is more of a "core distribution", which is much
easier to deal with.  (And I'm not making any opinions about TR being
more in the core in the former meaning -- if we take types seriously,
then that's probably the better way to go (with the untyped language
being layered on top), but that's a much more fundamental change than
distribution issues.)]


> We could call it 'collections' hierarchy as in Java, but I don't
> think that this is a good name. Ideally, I'd like to call it
> data-structure but that isn't a good path element.

+1 on both.  `data' does seem to me better than both of these, but I
still dislike it since it's a vague name like "etc".  Here's an
attempted clarification of what bothers me about it, and possibly
something to think about before August.

Currently, we use the toplevel collections as units of coherent pieces
of code -- they match both how the code is layered (at least it
should) and how it's distributed.  Yes, the plan for a minimal
distribution is still not concrete -- but we're already doing that.
For example, planet's granularity is by collection, and the most of
the distribution specs are in terms of collections too.  The bottom
line is that currently we have "top level collection" as something
that roughly corresponds to "a package".

Now, a name as generic as `data' is going outside of this role.  It's
likely to have in there general "core things" like `data/list' as well
as specific things like some queue that is optimized for a specific
task or perhaps a persistent set that is backed by a database.
Because of this I view `data' as a bad choice -- at least as long as
we have the current meaning of "a collection".  Even if the decision
was not made consciously, I think that the fact that the core data
types are in the `racket' collection are a direct byproduct of this
issue too.  (There's also the fact that such libraries might need to
behave differently -- for exaple, not getting an error in terms of
`vector-length' when the original call was some Honu `x.length()'.)

Perhaps the role of (toplevel) collections should change.  More
likely, it's about time we decide -- concretely -- on defining
"packages".  These would be relevant for planet, for minimizing the
distribution, and a whole bunch of other issues that depend on this.

It seems reasonable to define these packages somehow as either
toplevel collections or complete subtrees of them, with some way of
specifying which directory (or maybe a group of sibling directories)
is the "root" of a package.  But this requires modifying the current
way that toplevel collections are spliced together -- for example, you
should be able to get install a user-specific data/foo package
(something that is not possible now).

In that case, a generic name like `data' works out much better.  Since
that's a separate issue, my objection to `data' is based on the
current state of the system.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.