[racket-dev] new release policy

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Thu Aug 12 21:45:43 EDT 2010
Previous message: [racket-dev] new release policy
Next message: [racket-dev] new release policy
Messages sorted by: [date] [thread] [subject] [author]
On Thu, Aug 12, 2010 at 6:46 PM, Eli Barzilay <eli at barzilay.org> wrote:
> On Aug 12, Sam Tobin-Hochstadt wrote:
>> On Thu, Aug 12, 2010 at 7:28 AM, Eli Barzilay <eli at barzilay.org> wrote:
>> >
>> > One thing about stability that bugs me is pushing changes and
>> > extensions that are likely to change.  For example, I'm worried
>> > about Jay's push for a number of new features as a response to a
>> > blog post, it looks like coding in haste that is likely to change.
>>
>> This is something that the `unstable' collection can be useful for.
>
> +1, in general.

Since some of these features were already available in the unstable
collection, it's clear that useful functionality there is not visible
enough for beginners.

> On Aug 12, Matthew Flatt wrote:
>> At Thu, 12 Aug 2010 07:28:34 -0400, Eli Barzilay wrote:
>> > One thing about stability that bugs me is pushing changes and
>> > extensions that are likely to change.  For example, I'm worried
>> > about Jay's push for a number of new features as a response to a
>> > blog post, it looks like coding in haste that is likely to change.
>>
>> I strongly support Jay's current push. We don't need to wait for a
>> new package system to get our data-structure libraries in better
>> shape.
>
> My point was completely unrelated to packaging.  (Jay added it to the
> core racket language anyway.)

Perhaps his point was that the "ideal" to handle these additions would
be a well publicized and easy to install package.

> On Aug 12, Jay McCarthy wrote:
>>
>> As a final note, I don't intend this to be a mere pandering to this
>> blog post. When I was reading the blog post I found myself saying,
>> "Ya but you could implement that so easily" or "Ya, it is annoying
>> to use that pattern", etc. Hearing myself say that told me that I've
>> always wanted many of these functions but I always implemented them
>> in a one-off fashion or used a boilerplate pattern.
>
> I should explain why I said the above.  (Yes, this will be verbose.
> Why?  Because my short one-sentence version was taken as a baseless
> whine, so I'll go through the details that I'd think through before
> doing this.  Perhaps this is too much, but I'd like to think it isn't,
> especially for code in racket/private.)
>
> First, I completely agree with this specific mode of operation (and
> have added several things in a similar way) -- but the thing that I'm
> careful with is adding things that might be regretted later on.  One
> place that I'm not too happy about is `fold-files', which I'll use as
> an example.  IMO, such regretting can lead to one of these:
> * Coming up with a better interface, but keeping the old one for
>  compatibility (keeping `fold-files' as is, and keeping it working),
> * Same, but removing the old one (either a new `fold-files' with a
>  different interface, or a new name instead),
> * Improving the interface in a desired way, while keeping things
>  backward compatible.
>
> The last one is almost always the best way to go, and I view keyword
> arguments as an essential feature because it makes this option
> practical.  In the `fold-files' case it is likely that problems could
> be fixed by adding more keywords, and possible changing the default
> behavior (which is more work but even that is reduced with keywords,
> since existing call sites only need to be updated with a new keyword).
> Without keywords, this would not be possible.
>
> But dealwing with sequences is not something that can be deferred for
> later with some keyword -- at least not as easily as with a function.
> And in light of this I'd be much more careful about adding things that
> might turn out to be more regrets.
>
> As for the actual thing that I'm talking about -- sequences and making
> them more list like -- I should have explained why I think this
> addition is not something that is clearly desired in my view.
>
> Yes, there is definitely a problem with sequences and lists -- the
> first is way more complicated to deal with than the second, and Mark's
> post did highlight this issue.  One solution is what Jay did -- do the
> extra work of building the list-like sequence functions, and given
> enough primitives, the rest would be easy to add if/when needed.  But
> what about these issues:
>
> ** The addition is *slow*.  Very slow.  I wrote a test program to sum
>   up the integers from 0 to 2000000 -- and I get these numbers (test
>   code attached below):
>
>   - Plain `for' loop:          cpu time: 9 real time: 10 gc time: 0
>   - Using a sequence value:    cpu time: 222 real time: 222 gc time: 0
>   - Using `seqn-cdr' on range: cpu time: 1142 real time: 1142 gc time: 211
>   - `seqn-cdr' + value:        cpu time: 1141 real time: 1141 gc time: 210
>
>   I view this as a real problem -- performance issues at the 100x
>   slowdown are bugs at the library level, but at the core level
>   they're *bad* bugs.
>
> ** At the conceptual level, there is a major problem with for
>   sequences -- if you care about performance, they're second-class
>   values.  As seen above, using a sequence value instead of the form
>   directly in the for loop makes it 20x slower.  This is kind of ok
>   as things currently stand (pre Jay's afdditions), since most people
>   who deal with sequences are aware of the issue and propagate the
>   same cost tradeoff (providing both a syntactic and a value
>   variant).  Making sequence values be more list-like leads to
>   emphasizing this problem, and this can lead to problems that end up
>   in regretting something => overall reduced stability when there's
>   some new interface.  And specifically in this case, a 100x slowdown
>   is easily noticeable.
>
> For the first of these (related) problems, you can say that it's a
> performance problem that can be improved out later, with no interface
> changes.  The question is how much it can improve.  (One way to make
> it faster would be to provide macro versions of the whole set of
> sequence operations, and this ends up in double the amount of code.)
>
> Regardless, the second problem cannot be improved -- at least not
> without some major redesign.  But obviously this means that the
> problem is still there regardless of extensions -- and making sequence
> values easier to handle will eventually happen anyway.  The question
> is whether there is some other way to provide this functionality
> (list-like sequences) -- and in this case Mark talks explicitly about
> the Clojure feature that makes sequences very easy: lazy lists.  So,
> I've tried to get a rough idea of the costs for this simple example of
> a range, and I got:
>  cpu time: 1581 real time: 1581 gc time: 868
> This is still bad, but at the same neighborhood -- so maybe a better
> direction for making sequences nicer is to try making them (the value
> versions) into lazy lists?  It's true that this is close to the
> 100x-worse version of `seqn-rest', but if they're both close, then why
> not explore making the lazy list version run faster?
>
> But if I switch from `lazy' to plain thunks for the lazy list (which
> is probably more appropriate here anyway), I get
>  cpu time: 113 real time: 113 gc time: 19
> and this is even better than the current (in-range ...) -- and this is
> not even including tricks like Shriram mentioned of mixing laziness
> and strictness (eg, a range could have a delayed thunk at only every
> 10th cdr).  So this *does* look like a better way to get this done:
> speed up sequence values and make them easy to manage.  Why not try
> that or at least *talk* about trying that before adding a chunk of
> code that fights with the current windmills, making future
> improvements even harder to do?
>
> (And at this point I can see how people can think that such an
> improvement can be done regardless of the current extensions -- but my
> guess is that nobody will touch it, and things will stay as they are
> now.  For now.)
>
> (I'm obviously hoping that this guess will turn out wrong as a result
> of writing about it.  But probably not.  I'll shut up now.)

I parse your comments like this:

- We can't do these sequence functions fast.
- When we didn't provide them, people complained that they were missing.
- When we provide them slowly, people will complain that Racket is slow.
- It is worse to be slow than featureless.

I think it is worse to be featureless. Of the additions I made, I
believe that only seqn-cons, seqn-rest, seqn-tail, seqn-append,
seqn-map, seqn-filter, and seqn-add-between will have the speed
problem. The other functions that produce values will be a little
slower than their equivalents rewritten directly in 'for's, but
sometimes it is nice to write: (seqn-andmap even? s) rather than:
(for/and ([e s]) (even? e)).

I wager that seqn-cons, seqn-rest, and seqn-tail could be made very
faster if it were possible for a do-sequence to turn over control to
another sequence that had already started. seqn-append would be helped
a little.

I think it's worth it for them to be there despite the problems you've
mentioned. I don't think they're like fold-files (which I agree has a
bad interface); they mimic the time honored list API.

Jay

-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://teammccarthy.org/jay

"The glory of God is Intelligence" - D&C 93
Posted on the dev mailing list.
Previous message: [racket-dev] new release policy
Next message: [racket-dev] new release policy
Messages sorted by: [date] [thread] [subject] [author]