[racket] Cleaner way to work with gzipped data?
On 08/05/2013 04:29 PM, JP Verkamp wrote:
> Is there a nice / idiomatic way to work with gzipped data in a streaming
> manner (to avoid loading the rather large files into memory at once). So
> far as I can tell, my code isn't doing that. It hangs for a while on the
> call to gunzip-through-ports, long enough to uncompress the entire file,
> then reads are pretty quick afterwords.
>
> Here's what I have thus far:
>
> #lang racket
>
> (require file/gunzip)
>
> (define-values (pipe-from pipe-to) (make-pipe))
> (with-input-from-file "test.rkt.gz"
> (lambda ()
> (gunzip-through-ports (current-input-port) pipe-to)
> (for ([line (in-lines pipe-from)])
> (displayln line))))
You should probably 1) limit the size of the pipe (to stop it from
inflating the whole file at once) and 2) put the gunzip-through-ports
call in a separate thread. The gunzip thread will block when the pipe is
full; when your program reads some data out of the pipe, the gunzip
thread will be able to make some more progress. Something like this:
(define-values (pipe-from pipe-to) (make-pipe 4000))
(with-input-from-file "test.rkt.gz"
(lambda ()
(thread
(lambda ()
(gunzip-through-ports (current-input-port) pipe-to)
(close-output-port pipe-to)))
(for ([line (in-lines pipe-from)])
(displayln line))))
> As an additional problem, that code doesn't actually work.
> in-lines seems to be waiting for an eof-object? that
> gunzip-through-ports isn't sending. Am I missing something? It ends up
> just hanging after reading and printing the file.
The docs don't say anything about closing the port, so you'll probably
have to do that yourself. In the code above, I added a call to
close-output-port.
Ryan