[racket-dev] new release policy

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Aug 12 20:46:09 EDT 2010

On Aug 12, Sam Tobin-Hochstadt wrote:
> On Thu, Aug 12, 2010 at 7:28 AM, Eli Barzilay <eli at barzilay.org> wrote:
> >
> > One thing about stability that bugs me is pushing changes and
> > extensions that are likely to change.  For example, I'm worried
> > about Jay's push for a number of new features as a response to a
> > blog post, it looks like coding in haste that is likely to change.
> This is something that the `unstable' collection can be useful for.

+1, in general.

> It allows us to get some experience with a new API without
> foreclosing the possibility of changes.  Of course, that means it
> won't be as immediately useful for users outside of the collects,
> but I think a little patience can pay off here.

(And here too.)

On Aug 12, Matthew Flatt wrote:
> At Thu, 12 Aug 2010 07:28:34 -0400, Eli Barzilay wrote:
> > One thing about stability that bugs me is pushing changes and
> > extensions that are likely to change.  For example, I'm worried
> > about Jay's push for a number of new features as a response to a
> > blog post, it looks like coding in haste that is likely to change.
> I strongly support Jay's current push. We don't need to wait for a
> new package system to get our data-structure libraries in better
> shape.

My point was completely unrelated to packaging.  (Jay added it to the
core racket language anyway.)

On Aug 12, Jay McCarthy wrote:
> As a final note, I don't intend this to be a mere pandering to this
> blog post. When I was reading the blog post I found myself saying,
> "Ya but you could implement that so easily" or "Ya, it is annoying
> to use that pattern", etc. Hearing myself say that told me that I've
> always wanted many of these functions but I always implemented them
> in a one-off fashion or used a boilerplate pattern.

I should explain why I said the above.  (Yes, this will be verbose.
Why?  Because my short one-sentence version was taken as a baseless
whine, so I'll go through the details that I'd think through before
doing this.  Perhaps this is too much, but I'd like to think it isn't,
especially for code in racket/private.)

First, I completely agree with this specific mode of operation (and
have added several things in a similar way) -- but the thing that I'm
careful with is adding things that might be regretted later on.  One
place that I'm not too happy about is `fold-files', which I'll use as
an example.  IMO, such regretting can lead to one of these:
* Coming up with a better interface, but keeping the old one for
  compatibility (keeping `fold-files' as is, and keeping it working),
* Same, but removing the old one (either a new `fold-files' with a
  different interface, or a new name instead),
* Improving the interface in a desired way, while keeping things
  backward compatible.

The last one is almost always the best way to go, and I view keyword
arguments as an essential feature because it makes this option
practical.  In the `fold-files' case it is likely that problems could
be fixed by adding more keywords, and possible changing the default
behavior (which is more work but even that is reduced with keywords,
since existing call sites only need to be updated with a new keyword).
Without keywords, this would not be possible.

But dealwing with sequences is not something that can be deferred for
later with some keyword -- at least not as easily as with a function.
And in light of this I'd be much more careful about adding things that
might turn out to be more regrets.

As for the actual thing that I'm talking about -- sequences and making
them more list like -- I should have explained why I think this
addition is not something that is clearly desired in my view.

Yes, there is definitely a problem with sequences and lists -- the
first is way more complicated to deal with than the second, and Mark's
post did highlight this issue.  One solution is what Jay did -- do the
extra work of building the list-like sequence functions, and given
enough primitives, the rest would be easy to add if/when needed.  But
what about these issues:

** The addition is *slow*.  Very slow.  I wrote a test program to sum
   up the integers from 0 to 2000000 -- and I get these numbers (test
   code attached below):

   - Plain `for' loop:          cpu time: 9 real time: 10 gc time: 0
   - Using a sequence value:    cpu time: 222 real time: 222 gc time: 0
   - Using `seqn-cdr' on range: cpu time: 1142 real time: 1142 gc time: 211
   - `seqn-cdr' + value:        cpu time: 1141 real time: 1141 gc time: 210

   I view this as a real problem -- performance issues at the 100x
   slowdown are bugs at the library level, but at the core level
   they're *bad* bugs.

** At the conceptual level, there is a major problem with for
   sequences -- if you care about performance, they're second-class
   values.  As seen above, using a sequence value instead of the form
   directly in the for loop makes it 20x slower.  This is kind of ok
   as things currently stand (pre Jay's afdditions), since most people
   who deal with sequences are aware of the issue and propagate the
   same cost tradeoff (providing both a syntactic and a value
   variant).  Making sequence values be more list-like leads to
   emphasizing this problem, and this can lead to problems that end up
   in regretting something => overall reduced stability when there's
   some new interface.  And specifically in this case, a 100x slowdown
   is easily noticeable.

For the first of these (related) problems, you can say that it's a
performance problem that can be improved out later, with no interface
changes.  The question is how much it can improve.  (One way to make
it faster would be to provide macro versions of the whole set of
sequence operations, and this ends up in double the amount of code.)

Regardless, the second problem cannot be improved -- at least not
without some major redesign.  But obviously this means that the
problem is still there regardless of extensions -- and making sequence
values easier to handle will eventually happen anyway.  The question
is whether there is some other way to provide this functionality
(list-like sequences) -- and in this case Mark talks explicitly about
the Clojure feature that makes sequences very easy: lazy lists.  So,
I've tried to get a rough idea of the costs for this simple example of
a range, and I got:
  cpu time: 1581 real time: 1581 gc time: 868
This is still bad, but at the same neighborhood -- so maybe a better
direction for making sequences nicer is to try making them (the value
versions) into lazy lists?  It's true that this is close to the
100x-worse version of `seqn-rest', but if they're both close, then why
not explore making the lazy list version run faster?

But if I switch from `lazy' to plain thunks for the lazy list (which
is probably more appropriate here anyway), I get
  cpu time: 113 real time: 113 gc time: 19
and this is even better than the current (in-range ...) -- and this is
not even including tricks like Shriram mentioned of mixing laziness
and strictness (eg, a range could have a delayed thunk at only every
10th cdr).  So this *does* look like a better way to get this done:
speed up sequence values and make them easy to manage.  Why not try
that or at least *talk* about trying that before adding a chunk of
code that fights with the current windmills, making future
improvements even harder to do?

(And at this point I can see how people can think that such an
improvement can be done regardless of the current extensions -- but my
guess is that nobody will touch it, and things will stay as they are
now.  For now.)

(I'm obviously hoping that this guess will turn out wrong as a result
of writing about it.  But probably not.  I'll shut up now.)

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.