[racket-dev] racket/stream

From: Eli Barzilay (eli at barzilay.org)
Date: Wed Mar 16 13:21:28 EDT 2011

Three hours ago, Matthew Flatt wrote:
> Here's a proposal for cleaning up `racket/stream'.
> The first part of the proposal is terminology:
> [...]

That's close to what we talked about, with the addition of generators.

>    Generators include the result of `generator' and input
>    ports. (The `generator?' predicate currently returns false for
>    input ports, but that could change.)

I'd rather see these generators be only thunks (explained below) -- so
all thunks can be used as generators.  Possibly, "generator" becomes
just a name for a thunk used to retreive values, usually based on some
state -- but since this is confusing, I'll use "producer" here.

Using producer thunks looks to me better than generators (in the
`racket/generator' sense) because some of them are not things that you
could iterate over straightforwardly -- for example,

  (generator () (yield (yield 1)))

will expect a value on the second call (or with an input value, after
your commit today, the first will require a value too).  It also looks
to me better than an input port, which naturally resolves the question
of how to read from the port -- you just turn it into a producer with

  (lambda () (some-read in))

>    All generators are sequences.

This changes to "all producers are sequences", which means (IIUC) just
changing the default iteration dispatch (when no `in-*' is used) use
`in-producer' when the value to iterate over is a thunk.  (This might
be to permissive, so maybe only for strict thunks, where the arity is
only 0.)

>    (Currently, the result of the `generator' form cannot be used
>    directly as a sequence, but part of this proposal is to fix
>    that.)

(The symmetrical multiple arity on both input and output, generators
can have different arities at different points, so I don't think that
this could be done in general -- not without changing it, for example,
to forbid input values to yield, and making it always return
`#<void>'.  But I'd like it if it keeps a close match to the
language's multiple value capability.)

>    Not all sequences are generators. For example, a list is a
>    sequence, but it is not a generator. It's easy to create a
>    generator from any sequence by creating a stateful object that
>    encapsulates a continuation of a sequence traversal.

And this seems like it's simplified too -- this conversion to a
producer becomes:

  (generator () (for ([i some-sequence]) (yield i)))

>  * A _stream_ [...]
>    Streams include lists and lazy lists as produced by `stream-cons'
>    (i.e., the usual one instead of the one currently exported by
>    `racket/stream').
>    All streams are sequences.

I'll just note here that this was the direction that we talked about.
In other words, (which is easy to phrase now that we have these
terms), things like `sequence-cdr' are inherently bad ideas, and what
you really want is `stream-cdr'.  So the rename was basically a
stopgap for redoing it with proper streams.

>    Not all sequences are streams. For example, an input port is a
>    sequence but not a stream. It's easy to define a stream in terms
>    of a sequence by lazily extracting elements from the sequence and
>    caching them.

Another quick note: I really like this change of adding the *two*
concepts of streams and producers.  One thing that I also wanted is to
leave "sequences" as whatever gets used inside a for loop, so it's
generally not a first class value.  So it can be:

  * All streams are sequences
  * All producers are sequences
  * An (in-* ...) form creates a sequence when used in a for loop
    and when used outside, as a value,
    - some can evaluate to a stream, like `in-naturals'
    - and some can evaluate to a producer, like `in-port'

This can also improve the current state where the interface for
sequences is very heavy.  Instead of that, you just write code that
uses producers or streams, and sequence thing becomes an internal
implementation detail that can change.

Perhaps keeping backward compatibility, but I think that changing the
for iterations to use streams and producers directly will not only
simplify it, it will also be much faster when there's a need to use
them as first class values.  (Whereas now such uses have a high cost.)

>    A vector can be a stream only if streams are allowed to be
>    mutable.  That is, `stream-first' always returns the first
>    element of a stream, and since the first element of a vector can
>    change, `stream-first' of a vector can change. Alternatively,
>    streams could be defined as immutable, in which case a vector is
>    not a stream (although a stream could be generated from a
>    vector).

I'm not sure about the point of immutability (at least it's not
enforcable given that sometimes it's useful to have streams without
caching or maybe different promise kinds), but I don't see a need to
make vectors be streams too.  It seems to me fine to leave it as:
streams are sequences; producers are sequences; some other values are
sequences (because of the dynamic dispatch done by for loops).

(And since vectors are sequences, and sequences are easily converted
to streams or producers, then it's easy to convert vectors to them
too, if needed, and you'll need to choose either one which seems like
a decision that should be left to the user.)

> The renaming of `racket/sequence' to `racket/stream' seems to have
> been an attempt to use the word "stream" for the most general
> case. But "stream" has an established functional meaning the Lisp
> world, while "sequence" is already set up in Racket to be the more
> general case. The above notion of "generator" is also consistent
> with existing uses, as far as I can tell.

(BTW, I'm suggesting "producer" since I think it's common enough, and
leaves `racket/generator' to use the term in the same way Python does,
which seems pretty popular.)

> I haven't been able to find a widely used term for a supertype of
> "stream" and "generator", and I think "sequence" works as well as
> anything, but let me know if I've overlooked a term that we should
> be using.

(Seems fine to me...)

> Based on the above terminology, here's the programming part of the
> proposal:
>  1. Rename `racket/stream' and its bindings back to "sequence".
>  2. Introduce a `racket/stream' library that provides stream
>     versions (typically lazy) of the current operations, including
>     the expected lazy `stream-cons' form.

So what we talked about is similar, except that #2 replaces
`racket/stream' in place.  `racket/sequence' doesn't happen since
there's no point in that (and AFAICT, people really want the stream
operations, not the sequence ones).

I also like the separation of streams and producers for this: if it
turns out that there's some need for a `racket/producer' library with
things like `producer-cdr', then the potential problems in that are
much clearer.

> Adding streams as sequences effectively introduces a simpler
> representation for sequences that fit neatly into the stream
> protocol.


> Along those lines, operations like `sequence-append' can detect when
> all arguments are streams and defer to `stream-append' to keep any
> representation benefits intact.

So if the above is done, then there is no `sequence-append' -- but
there is still `in-sequences' which is essentially the same right now.
(Sequences are not first class values then there is no need for this
to be available outside of an iteration specification, but probably
easy to keep it, as something that returns a producer.)

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.