[racket-dev] proposal: `data' collection

From: Eli Barzilay (eli at barzilay.org)
Date: Wed Jun 30 21:44:10 EDT 2010

On Jun 30, Sam Tobin-Hochstadt wrote:
> On Wed, Jun 30, 2010 at 9:02 PM, Eli Barzilay <eli at barzilay.org> wrote:
> > On Jun 30, Sam Tobin-Hochstadt wrote:
> >> On Wed, Jun 23, 2010 at 2:29 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu> wrote:
> >> > At the Northeastern PLT lunch today, I proposed adding a top-level
> >> > `data' collection, for all manner of data structures.
> >>
> >> Based on the discussion,
> >
> > There was no discussion.  I posted the main problem with that, which
> > you never replied to.
> 
> I don't believe you pointed out a problem.  There was discussion was
> of what sense of "core" we mean, which I clarified. As demonstrated by
> the `syntax' collections, this doesn't pose a problem.

Below is what wrote, which you replied to as if the only issue is the
name of the collection.  The name is just a symptom -- which will go
away *if* we have a solution to separating collections.  If not, then
such a generic collection will be a problem regardless of the name.

And just in case you'll want to ignore the actual content of this:
(a) I'm not objecting to `data' as a name, (b) I *want* a good
solution for this problem, and have wanted one for a while, (c) if
there is a solution for this, then `data' (while not great) works
as well as in the Haskell example you mentioned, but as things stand,
it is a problem regardless of the name.

-------------------------------------------------------------------------------

Matthias said:

> We could call it 'collections' hierarchy as in Java, but I don't
> think that this is a good name. Ideally, I'd like to call it
> data-structure but that isn't a good path element.

+1 on both.  `data' does seem to me better than both of these, but I
still dislike it since it's a vague name like "etc".  Here's an
attempted clarification of what bothers me about it, and possibly
something to think about before August.

Currently, we use the toplevel collections as units of coherent pieces
of code -- they match both how the code is layered (at least it
should) and how it's distributed.  Yes, the plan for a minimal
distribution is still not concrete -- but we're already doing that.
For example, planet's granularity is by collection, and the most of
the distribution specs are in terms of collections too.  The bottom
line is that currently we have "top level collection" as something
that roughly corresponds to "a package".

Now, a name as generic as `data' is going outside of this role.  It's
likely to have in there general "core things" like `data/list' as well
as specific things like some queue that is optimized for a specific
task or perhaps a persistent set that is backed by a database.
Because of this I view `data' as a bad choice -- at least as long as
we have the current meaning of "a collection".  Even if the decision
was not made consciously, I think that the fact that the core data
types are in the `racket' collection are a direct byproduct of this
issue too.  (There's also the fact that such libraries might need to
behave differently -- for exaple, not getting an error in terms of
`vector-length' when the original call was some Honu `x.length()'.)

Perhaps the role of (toplevel) collections should change.  More
likely, it's about time we decide -- concretely -- on defining
"packages".  These would be relevant for planet, for minimizing the
distribution, and a whole bunch of other issues that depend on this.

It seems reasonable to define these packages somehow as either
toplevel collections or complete subtrees of them, with some way of
specifying which directory (or maybe a group of sibling directories)
is the "root" of a package.  But this requires modifying the current
way that toplevel collections are spliced together -- for example, you
should be able to get install a user-specific data/foo package
(something that is not possible now).

In that case, a generic name like `data' works out much better.  Since
that's a separate issue, my objection to `data' is based on the
current state of the system.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.