[plt-scheme] writing to subprocess's stdin and then...

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Jun 17 04:49:10 EDT 2008

On Jun 17, YC wrote:
> On Mon, Jun 16, 2008 at 10:40 PM, Eli Barzilay <eli at barzilay.org> wrote:
> >
> > * finally, the biggest problem is a conceptual one: you read stuff
> >   from the output only after the process has finished -- but what
> >   if it spits out a lot of data?  In that case, it will not
> >   finish, and instead just sit there waiting for you to read that
> >   data, and you'll be getting into a very common race condision
> >   with subprocesses.
> >
> >   What you really need is to do the reading in a thread, so the
> >   process can continue running.  It might seem strange at first,
> >   but when there's a lot of data then *someone* needs to hold it,
> >   and the OS will hold only a very small amount (and for good
> >   reasons).  Your thread will need to do just that accumulation
> >   (or it can just to the processing, whatever it is).
> 
> After re-reading your example, I think I started to grok what you were doing
> on http://www.cs.brown.edu/pipermail/plt-scheme/2006-February/011953.html:
> 
> ...
> (define-values (in out) (make-pipe)) ...
> ...
> (thread (lambda ()
>           (copy-port pout out)
>           (close-output-port out)
>           (subprocess-wait p)))
> 
> You first created a pipe for holding the accumulation, and then you
> started a thread to read the data from pout into pipe's out, and
> when out is closed the data gets piped to in (perhaps this is
> happening in the background without you have to close it too?), and
> finally the process exits...  correct?

Actually that extra pipe and thread are not strictly needed.  Same for
the use of /dev/null -- it can just close the subprocess's input right
after it fires it up.  Below is a more compact and 4.0-ized example.


> But shouldn't the ports be closed after subprocess-wait?

You usually want to close the ports input so it will finish, since
many usefule processes (at least on unix) work until their stdin runs
out.


Here's the revised example -- with no use of threads.

 | #lang scheme
 | 
 | (require scheme/port)
 | (define (with-input-from-subprocess exe thunk)
 |   (define-values (p pout pin perr)
 |     (subprocess #f #f (current-error-port)
 |                 (find-executable-path exe)))
 |   (close-output-port pin)
 |   (parameterize ([current-input-port pout])
 |     (begin0 (thunk)
 |       (subprocess-wait p))))
 | 
 | (with-input-from-subprocess "du"
 |   (lambda ()
 |     (for ([line (in-lines)])
 |       (printf ">> ~s\n" (regexp-split #rx"[/\t]" line)))))


But it still relies on using (current-error-port) for the subprocess's
stderr, which might not be true if this function is called from a
`parameterize'.  The mzlib/process code takes care of such cases -- you
can just run

  (parameterize ([current-output-port (open-output-bytes)])
    (system "du")
    (get-output-bytes (current-output-port)))

If you look at that file, you'll see that in this case the code will
make the necessary pipes and a thread to transfer their contents.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.