<div dir="ltr">Figured it out and cleaned it up. It turns out that I was using <font face="courier new, monospace">with-handlers</font> oddly, but reading further though the documentation it works as expected. Here's a new version (generalized to any input-port):<div>
<br></div><div><div><font face="courier new, monospace">(define (with-gunzip thunk)</font></div><div><font face="courier new, monospace"> (define-values (pipe-from pipe-to) (make-pipe))</font></div><div><font face="courier new, monospace"> (with-handlers ([exn:fail?</font></div>
<div><font face="courier new, monospace"> (λ (err)</font></div><div><font face="courier new, monospace"> (close-output-port pipe-to)</font></div><div><font face="courier new, monospace"> (close-input-port pipe-from)</font></div>
<div><font face="courier new, monospace"> (error 'with-gunzip (exn-message err)))])</font></div><div><font face="courier new, monospace"> (gunzip-through-ports (current-input-port) pipe-to)</font></div>
<div><font face="courier new, monospace"> (close-output-port pipe-to)</font></div><div><font face="courier new, monospace"> (parameterize ([current-input-port pipe-from])</font></div><div><font face="courier new, monospace"> (thunk))</font></div>
<div><font face="courier new, monospace"> (close-input-port pipe-from)))</font></div></div><div><br></div><div>If anyone's interested in a more in depth write up / source code for this and with-gzip:</div><div>- writeup: <a href="http://blog.jverkamp.com/2013/08/06/adventures-in-racket-gzip/">http://blog.jverkamp.com/2013/08/06/adventures-in-racket-gzip/</a></div>
<div>- source: <a href="https://github.com/jpverkamp/small-projects/tree/master/blog/with-gzip.rkt">https://github.com/jpverkamp/small-projects/tree/master/blog/with-gzip.rkt</a></div></div><div class="gmail_extra"><br><br>
<div class="gmail_quote">On Mon, Aug 5, 2013 at 5:36 PM, JP Verkamp <span dir="ltr"><<a href="mailto:racket@jverkamp.com" target="_blank">racket@jverkamp.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Thanks! <font face="courier new, monospace">make-pipe</font> isn't something that I've had to use otherwise, so I missed the optional parameter. That does certainly seem to help.<div><br></div><div>
Here's my first take of <font face="courier new, monospace">with-input-from-gzipped-file</font>:</div><div><br></div><div><div><div><font face="courier new, monospace">(define (with-input-from-gzipped-file filename thunk #:buffer-size [buffer-size #f])</font></div>
<div><font face="courier new, monospace"> (call-with-input-file filename</font></div><div><font face="courier new, monospace"> (lambda (file-from)</font></div><div><font face="courier new, monospace"> (define-values (pipe-from pipe-to) (make-pipe buffer-size))</font></div>
<div><font face="courier new, monospace"> </font></div><div><font face="courier new, monospace"> (thread </font></div><div><font face="courier new, monospace"> (λ () </font></div><div><font face="courier new, monospace"> (gunzip-through-ports file-from pipe-to)</font></div>
<div><font face="courier new, monospace"> (close-output-port pipe-to)))</font></div><div><font face="courier new, monospace"> </font></div><div><font face="courier new, monospace"> (current-input-port pipe-from)</font></div>
<div><font face="courier new, monospace"> (thunk)</font></div><div><font face="courier new, monospace"> (close-input-port pipe-from))))</font></div></div><div><br></div><div>The main thing missing is that there's no error handling (where the pipe should still be closed). At the very least, if I try to call this on a non-gzipped file, it breaks on the <font face="courier new, monospace">gunzip-through-ports</font> line. Theoretically, some variation of <font face="courier new, monospace">with-handlers</font> should work (<font face="courier new, monospace">error</font> should raise an <font face="courier new, monospace">exn:fail?</font>, yes?), but it doesn't seem to be helping.</div>
<div><br></div><div>Any help with that?</div><div><br></div><div>Alternatively, I've now found this: <a href="http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard" target="_blank">http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard</a></div>
<div><br></div><div>It seems to do exactly what I need, albeit without the call-with-* forms, but that's easy enough to wrap. With some very basic testing, it does seem to be buffering though, although it is a bit slower than the above. Not enough to cause trouble though.</div>
</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <span dir="ltr"><<a href="mailto:ryanc@ccs.neu.edu" target="_blank">ryanc@ccs.neu.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On 08/05/2013 04:29 PM, JP Verkamp wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Is there a nice / idiomatic way to work with gzipped data in a streaming<br>
manner (to avoid loading the rather large files into memory at once). So<br>
far as I can tell, my code isn't doing that. It hangs for a while on the<br>
call to gunzip-through-ports, long enough to uncompress the entire file,<br>
then reads are pretty quick afterwords.<br>
<br>
Here's what I have thus far:<br>
<br>
#lang racket<br>
<br>
(require file/gunzip)<br>
<br>
(define-values (pipe-from pipe-to) (make-pipe))<br>
(with-input-from-file "test.rkt.gz"<br>
(lambda ()<br>
(gunzip-through-ports (current-input-port) pipe-to)<br>
(for ([line (in-lines pipe-from)])<br>
(displayln line))))<br>
</blockquote>
<br></div>
You should probably 1) limit the size of the pipe (to stop it from inflating the whole file at once) and 2) put the gunzip-through-ports call in a separate thread. The gunzip thread will block when the pipe is full; when your program reads some data out of the pipe, the gunzip thread will be able to make some more progress. Something like this:<br>
<br>
(define-values (pipe-from pipe-to) (make-pipe 4000))<br>
(with-input-from-file "test.rkt.gz"<br>
(lambda ()<br>
(thread<div><br>
(lambda ()<br>
(gunzip-through-ports (current-input-port) pipe-to)<br></div>
(close-output-port pipe-to)))<div><br>
(for ([line (in-lines pipe-from)])<br>
(displayln line))))<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
As an additional problem, that code doesn't actually work.<br>
in-lines seems to be waiting for an eof-object? that<br>
gunzip-through-ports isn't sending. Am I missing something? It ends up<br>
just hanging after reading and printing the file.<br>
</blockquote>
<br></div>
The docs don't say anything about closing the port, so you'll probably have to do that yourself. In the code above, I added a call to close-output-port.<span><font color="#888888"><br>
<br>
Ryan<br>
<br>
</font></span></blockquote></div><br></div>
</div></div></blockquote></div><br></div>