[plt-scheme] v4 questions

From: Eli Barzilay (eli at barzilay.org)
Date: Sun Mar 30 23:08:14 EDT 2008

On Mar 30, Mark Engelberg wrote:
> On Sun, Mar 30, 2008 at 2:04 PM, Eli Barzilay <eli at barzilay.org> wrote:
> >  I don't know what that means -- pure functional types like hash
> >  tables are really hard, and offer little advantage over the
> >  impure ones if you have them (at least IME).  And if you talk
> >  about vectors, then things get even stickier (are you talking
> >  about a real vector, or about something that behaves like one?).
> >  There are many choices and I don't think that there is one that
> >  clearly sticks out.  Especially if you want cons/first/rest for
> >  these things (and I don't even know how would you define them for
> >  vectors and hash tables).
> 
> I'm not talking about a true vector, just something that has
> improved performance for random-access and deque operations.

OK, so like I said -- I don't know of a single clear winner here.
What this means (to me) is that the user of the library should be able
to determine which data structure to use, and this means that it's
much much bigger than a simple library for immutable vector-like
containers.  (But perhaps something like wb trees or something like
that could be added.)


> The usual arguments about the benefits of immutability apply: better
> ease of testing, better easy of analyzing correctness of code,
> better thread safety.  But most importantly to me, the style of
> writing the code would then be consistent with list-based code.

I think that all of these can easily be satisfied with an imperative
implementation.  The real question is, for example, how do you deal
with references to old copies for some functional value if you need
to, and in that case we fall back to the above issue.


> I frequently find myself in the following situation: I've written a
> lot of code that relies on lists.  At some point, I realize that I
> need to do an operation that is very slow on lists.  Now, I've got a
> problem.  Converting over to something like a vector requires
> massive changes, not only to the syntax, but to the structure of my
> program.

But that points at the inherent differences between the two
structures.  Almost any vector-like structure that I can think of will
perform much worse when you build a large vector with individual cons
operations.  In other words, I think that the problem that you're
describing is much bigger than the particular choice of a PLT library
implementation.  (But I think that in most cases you can argue that if
a programmer switches from a list to a vector, then the problem and/or
the design were wrong to begin with, to such an extent that a major
refactor is inevitable.)


> In my ideal programming world, up at the top of a given module, I
> could specify what I want my default data structure to be, e.g.,
> #lang scheme
> #list basic-list
> 
> Then, when I'm done with my module, I can easily test and time the
> code using various data structures that make different tradeoffs:
> #lang scheme
> #list Hinze-Patterson-finger-tree
> 
> This would automatically convert my cons and list constructors and the
> reader for lists, so that they would create the data structure that is
> defined in the module to be the default.  Presumably the other
> functions, such as first, rest, map, fold, comprehensions, pattern
> matching, etc., would be polymorphic and work properly on all the
> datatypes that could legally be substituted for lists.  Basically, I
> want a super-simple way to parameterize my code over the fundamental
> data structures it uses.

And how would such programs interact with each other?  You run into
some problems that can easily mean an interface that ends up
translating a whole data structure on each barrier crossing.  This can
be catastrophic with something like

  (module stack scheme/base
    (provide push)
    (define empty null)
    (define (push x stack) (cons x stack)))

  (module foo scheme/basic-with-vector-default
    (require stack)
    (define stack (push 1 empty))
    (car stack))


> This is my ideal, but maybe this kind of polymorphism is a bit
> far-fetched in the context of the Scheme way of doing things.

Seems to me to be far fetched regardless of the language.  I'm sure
that there are some possibly good solutions, but they will be limited
in the way that you can pass values around.


> But right now, it's way too difficult to explore the effects of
> changing data structures on a given program.

Why?  Just like in any language -- simply write your code with proper
separation of your data and operations from the code that uses them,
and then you have just one place to change to see the performance
effect.  This way you control the abstractions that you need.  (BTW,
lists just happen to be so fundamental to scheme that it's easy to
abuse them in a way that makes it hard to recover.)


> Is there a list of the planned list library additions somewhere?

On top of what's in, I have these things that will probably get in:
`filter-map', `append-map', `remove-duplicates', `partition',
`list-interleave' (or list-join), `map/values', `filter-not'.


On Mar 30, Mark Engelberg wrote:
> On Sun, Mar 30, 2008 at 2:18 PM, Eli Barzilay <eli at barzilay.org> wrote:
> >  > What about integration with the reader?
> >
> >  Same here.  (But a reader syntax is not too useful since it means
> >  that you could have used plain lists.)
> 
> Sorry, what I meant was more of an integration with the way streams
> are displayed in the evaluation window.  For example, numbers have a
> lot of close integration, where you can right-click on a number and
> view the number in various ways.  Similarly, I'd like to be able to
> right-click on a stream and choose various options, such as seeing
> the already-forced portion of the stream, or choosing a fixed number
> of items to force and show (such as the first 100 items followed by
> a ...)

These might get implemented eventually as part of lazy scheme (which
requires more drscheme support for promises).


> >  My experience is that there are way too many options to have a
> >  clear choice here -- and my personal preference for dealing with
> >  it is to just bite all the way through and go with a lazy
> >  language, instead of the stream half-baked approach.
> 
> Is Lazy Scheme going to fill that niche?

I don't see why not.


> Last time I played around with Lazy Scheme, it didn't interact very
> well with large chunks of the Scheme libraries.

(This will likely continue to be the case.)


> For me, I find that in order to avoid iteration, I sometimes need:
> 1.  A stream/iterator/generator that looks and acts like an infinite
> list, but recomputes its values every time.
> 2.  A stream/lazy list that looks and acts like an infinite list, but
> only computes its values the first time and then caches them.
> 3.  A lazy keyword that acts like a delay with an auto-force upon
>     evaluation.
> 
> The first two can probably be done to some extent in libraries.

Yes.


> I doubt the third one can be.

It cannot.  To do this, you change *all* references to values to check
if they're these kind of promises and force them if so.  The cost of
something like this is huge, and on top of this you run into problems
where in some cases you don't want such automatic force (eg, in
arguments to `list').


> But these features seem sufficient to me, and by picking and
> choosing where to apply those techiques, I don't end up with
> programs whose space performance is as inscrutable as in, say
> Haskell.

That's unlikely to go away -- with laziness you can easily run into
the same kind of debugging horrors -- no matter what language it's in.


> The designers of Scala and F# probably have similar tastes to me in
> this regard, because these forms of laziness are offered by those
> languages.

(Any pointers with interaction examples?)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.