[plt-scheme] interleaving the I and O in I/O

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sun Jan 27 10:29:34 EST 2008

At Sun, 27 Jan 2008 09:36:40 -0500, Prabhakar Ragde wrote:
> Colleagues of mine discovered this issue when automarking simple 
> programs written by second-year students. Consider the following program 
> for reading numbers from a file and printing the sequence out twice, 
> intended to be invoked on the command line with file redirection.
> 
> (define (print-list l)
>    (cond
>      [(null? l) (void)]
>      [else (printf "~a\n" (car l))
>            (print-list (cdr l))]))
> 
> (define (get-input acc)
>    (let
>      [(i (read))]
>      (cond
>        [(eof-object? i) (reverse acc)]
>        [else (printf "~a\n" i)
>              (get-input (cons i acc))])))
> 
> (define l (get-input '()))
> ;(print-list l)
> (print-list l)
> 
> When I save this in prog.ss on my MacBookPro running v372 and type
> 
> % time mzscheme -r prog.ss <bignums.txt >progout.txt
> 
> where bignums has 200,000 numbers, I get the following timing information:
> 
> real	0m3.775s
> user	0m2.221s
> sys	0m1.550s
> 
> If I comment out the (printf ...) in get-input and uncomment the 
> (print-list l), the times become:
> 
> real	0m0.979s
> user	0m0.915s
> sys	0m0.054s
> 
> I'm curious as to what is going on 

When you read from the original input port, the original output port is
automatically flushed. So, the first version writes line-by-line, while
the second writes the list in blocks. I think that accounts for the
time difference.

I note that there's another layer of automatic behavior here, which is
that block buffering for output is enabled because output is directed
to a file, as opposed to a tty.

For 4.0, we should reconsider auto-flush of the current output port
when reading from the current input port. Maybe it should only happen
when the output is a tty. It's also awkward that the behavior is tied
to the original ports, but that's another part of the trade-off in
performance and expected behavior, and it will probably stay.

> and what this means for program 
> design in general.

The buffering and flushing rules try to balance performance with
expected behavior. In this case, an example of "expected behavior" is
that when you print text to stdout to ask users a question (maybe
without a newline), they see the question and can type an answer.

Maybe the general conclusion is that the balance between performance
and simplicity is rarely easy to find.

Matthew



Posted on the users mailing list.