[racket] Cleaner way to work with gzipped data?

From: Ryan Culpepper (ryanc at ccs.neu.edu)
Date: Mon Aug 5 16:51:13 EDT 2013

On 08/05/2013 04:29 PM, JP Verkamp wrote:
> Is there a nice / idiomatic way to work with gzipped data in a streaming
> manner (to avoid loading the rather large files into memory at once). So
> far as I can tell, my code isn't doing that. It hangs for a while on the
> call to gunzip-through-ports, long enough to uncompress the entire file,
> then reads are pretty quick afterwords.
>
> Here's what I have thus far:
>
> #lang racket
>
> (require file/gunzip)
>
> (define-values (pipe-from pipe-to) (make-pipe))
> (with-input-from-file "test.rkt.gz"
>    (lambda ()
>      (gunzip-through-ports (current-input-port) pipe-to)
>      (for ([line (in-lines pipe-from)])
>        (displayln line))))

You should probably 1) limit the size of the pipe (to stop it from 
inflating the whole file at once) and 2) put the gunzip-through-ports 
call in a separate thread. The gunzip thread will block when the pipe is 
full; when your program reads some data out of the pipe, the gunzip 
thread will be able to make some more progress. Something like this:

(define-values (pipe-from pipe-to) (make-pipe 4000))
(with-input-from-file "test.rkt.gz"
   (lambda ()
     (thread
       (lambda ()
         (gunzip-through-ports (current-input-port) pipe-to)
         (close-output-port pipe-to)))
     (for ([line (in-lines pipe-from)])
       (displayln line))))

> As an additional problem, that code doesn't actually work.
> in-lines seems to be waiting for an eof-object? that
> gunzip-through-ports isn't sending. Am I missing something? It ends up
> just hanging after reading and printing the file.

The docs don't say anything about closing the port, so you'll probably 
have to do that yourself. In the code above, I added a call to 
close-output-port.

Ryan


Posted on the users mailing list.