[racket-dev] racket/stream

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Mar 16 10:04:17 EDT 2011

Here's a proposal for cleaning up `racket/stream'.

The first part of the proposal is terminology:

 * A _sequence_ is any kind of input to `for'. That is, it's a generic
   ordered collection of elements that supports iteration through the
   elements.

   Sequences include lists, vectors, and input ports.

   All generators (as defined next) are sequences and all streams (as
   defined further below) are sequences.

 * A _generator_ is a stateful object that encapsulates a sequence and
   produces the next element on demand.

   Generators include the result of `generator' and input ports. (The
   `generator?' predicate currently returns false for input ports, but
   that could change.)

   All generators are sequences. (Currently, the result of the
   `generator' form cannot be used directly as a sequence, but part of
   this proposal is to fix that.)

   Not all sequences are generators. For example, a list is a sequence,
   but it is not a generator. It's easy to create a generator from any
   sequence by creating a stateful object that encapsulates a
   continuation of a sequence traversal.

 * A _stream_ is a functional object with operations `stream-first' for
   getting the first element and `stream-rest' for getting the rest of
   the elements. That is, for a given stream value, applying
   `stream-first' multiple times keeps returning the first value of the
   stream, and `stream-rest' multiple times gives back the same rest of
   the stream.

   Streams include lists and lazy lists as produced by `stream-cons'
   (i.e., the usual one instead of the one currently exported by
   `racket/stream').

   All streams are sequences.

   Not all sequences are streams. For example, an input port is a
   sequence but not a stream. It's easy to define a stream in terms of
   a sequence by lazily extracting elements from the sequence and
   caching them.

   A vector can be a stream only if streams are allowed to be mutable.
   That is, `stream-first' always returns the first element of a
   stream, and since the first element of a vector can change,
   `stream-first' of a vector can change. Alternatively, streams could
   be defined as immutable, in which case a vector is not a stream
   (although a stream could be generated from a vector).

The renaming of `racket/sequence' to `racket/stream' seems to have been
an attempt to use the word "stream" for the most general case. But
"stream" has an established functional meaning the Lisp world, while
"sequence" is already set up in Racket to be the more general case. The
above notion of "generator" is also consistent with existing uses, as
far as I can tell. I haven't been able to find a widely used term for a
supertype of "stream" and "generator", and I think "sequence" works as
well as anything, but let me know if I've overlooked a term that we
should be using.

Based on the above terminology, here's the programming part of the
proposal:

 1. Rename `racket/stream' and its bindings back to "sequence".

 2. Introduce a `racket/stream' library that provides stream versions
    (typically lazy) of the current operations, including the expected
    lazy `stream-cons' form.

 3. Make streams work as sequences, and fill in various conversions
    like `for/stream' for lazily building a stream and `in-stream' to
    potentially speed up stream iteration via `for'.

Adding streams as sequences effectively introduces a simpler
representation for sequences that fit neatly into the stream protocol.
Along those lines, operations like `sequence-append' can detect when
all arguments are streams and defer to `stream-append' to keep any
representation benefits intact.



Posted on the dev mailing list.