[racket] Cleaner way to work with gzipped data?

From: JP Verkamp (racket at jverkamp.com)
Date: Mon Aug 5 17:36:19 EDT 2013

Thanks! make-pipe isn't something that I've had to use otherwise, so I
missed the optional parameter. That does certainly seem to help.

Here's my first take of with-input-from-gzipped-file:

(define (with-input-from-gzipped-file filename thunk #:buffer-size
[buffer-size #f])
  (call-with-input-file filename
    (lambda (file-from)
      (define-values (pipe-from pipe-to) (make-pipe buffer-size))

      (thread
          (λ ()
            (gunzip-through-ports file-from pipe-to)
            (close-output-port pipe-to)))

      (current-input-port pipe-from)
      (thunk)
      (close-input-port pipe-from))))

The main thing missing is that there's no error handling (where the pipe
should still be closed). At the very least, if I try to call this on a
non-gzipped file, it breaks on the gunzip-through-ports line.
Theoretically, some variation of with-handlers should work (error should
raise an exn:fail?, yes?), but it doesn't seem to be helping.

Any help with that?

Alternatively, I've now found this:
http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard

It seems to do exactly what I need, albeit without the call-with-* forms,
but that's easy enough to wrap. With some very basic testing, it does seem
to be buffering though, although it is a bit slower than the above. Not
enough to cause trouble though.


On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <ryanc at ccs.neu.edu> wrote:

> On 08/05/2013 04:29 PM, JP Verkamp wrote:
>
>> Is there a nice / idiomatic way to work with gzipped data in a streaming
>> manner (to avoid loading the rather large files into memory at once). So
>> far as I can tell, my code isn't doing that. It hangs for a while on the
>> call to gunzip-through-ports, long enough to uncompress the entire file,
>> then reads are pretty quick afterwords.
>>
>> Here's what I have thus far:
>>
>> #lang racket
>>
>> (require file/gunzip)
>>
>> (define-values (pipe-from pipe-to) (make-pipe))
>> (with-input-from-file "test.rkt.gz"
>>    (lambda ()
>>      (gunzip-through-ports (current-input-port) pipe-to)
>>      (for ([line (in-lines pipe-from)])
>>        (displayln line))))
>>
>
> You should probably 1) limit the size of the pipe (to stop it from
> inflating the whole file at once) and 2) put the gunzip-through-ports call
> in a separate thread. The gunzip thread will block when the pipe is full;
> when your program reads some data out of the pipe, the gunzip thread will
> be able to make some more progress. Something like this:
>
> (define-values (pipe-from pipe-to) (make-pipe 4000))
> (with-input-from-file "test.rkt.gz"
>   (lambda ()
>     (thread
>
>       (lambda ()
>         (gunzip-through-ports (current-input-port) pipe-to)
>         (close-output-port pipe-to)))
>
>     (for ([line (in-lines pipe-from)])
>       (displayln line))))
>
>  As an additional problem, that code doesn't actually work.
>> in-lines seems to be waiting for an eof-object? that
>> gunzip-through-ports isn't sending. Am I missing something? It ends up
>> just hanging after reading and printing the file.
>>
>
> The docs don't say anything about closing the port, so you'll probably
> have to do that yourself. In the code above, I added a call to
> close-output-port.
>
> Ryan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130805/fead7082/attachment-0001.html>

Posted on the users mailing list.