[racket] Cleaner way to work with gzipped data?

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Tue Aug 6 11:47:17 EDT 2013

You might consider using dynamic-wind instead of that with-handlers. Or,
instead of (error 'with-gunzip ...) just do (raise exn). That way you won't
lose the stack information in the original exception (which is likely the
one a user would want).

Robby


On Tue, Aug 6, 2013 at 10:40 AM, JP Verkamp <racket at jverkamp.com> wrote:

> Figured it out and cleaned it up. It turns out that I was using
> with-handlers oddly, but reading further though the documentation it
> works as expected. Here's a new version (generalized to any input-port):
>
> (define (with-gunzip thunk)
>   (define-values (pipe-from pipe-to) (make-pipe))
>   (with-handlers ([exn:fail?
>                    (λ (err)
>                      (close-output-port pipe-to)
>                      (close-input-port pipe-from)
>                      (error 'with-gunzip (exn-message err)))])
>     (gunzip-through-ports (current-input-port) pipe-to)
>     (close-output-port pipe-to)
>     (parameterize ([current-input-port pipe-from])
>       (thunk))
>     (close-input-port pipe-from)))
>
> If anyone's interested in a more in depth write up / source code for this
> and with-gzip:
> - writeup: http://blog.jverkamp.com/2013/08/06/adventures-in-racket-gzip/
> - source:
> https://github.com/jpverkamp/small-projects/tree/master/blog/with-gzip.rkt
>
>
> On Mon, Aug 5, 2013 at 5:36 PM, JP Verkamp <racket at jverkamp.com> wrote:
>
>> Thanks! make-pipe isn't something that I've had to use otherwise, so I
>> missed the optional parameter. That does certainly seem to help.
>>
>> Here's my first take of with-input-from-gzipped-file:
>>
>> (define (with-input-from-gzipped-file filename thunk #:buffer-size
>> [buffer-size #f])
>>   (call-with-input-file filename
>>     (lambda (file-from)
>>       (define-values (pipe-from pipe-to) (make-pipe buffer-size))
>>
>>       (thread
>>           (λ ()
>>             (gunzip-through-ports file-from pipe-to)
>>             (close-output-port pipe-to)))
>>
>>       (current-input-port pipe-from)
>>       (thunk)
>>       (close-input-port pipe-from))))
>>
>> The main thing missing is that there's no error handling (where the pipe
>> should still be closed). At the very least, if I try to call this on a
>> non-gzipped file, it breaks on the gunzip-through-ports line.
>> Theoretically, some variation of with-handlers should work (error should
>> raise an exn:fail?, yes?), but it doesn't seem to be helping.
>>
>> Any help with that?
>>
>> Alternatively, I've now found this:
>> http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard
>>
>> It seems to do exactly what I need, albeit without the call-with-* forms,
>> but that's easy enough to wrap. With some very basic testing, it does seem
>> to be buffering though, although it is a bit slower than the above. Not
>> enough to cause trouble though.
>>
>>
>> On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <ryanc at ccs.neu.edu> wrote:
>>
>>> On 08/05/2013 04:29 PM, JP Verkamp wrote:
>>>
>>>> Is there a nice / idiomatic way to work with gzipped data in a streaming
>>>> manner (to avoid loading the rather large files into memory at once). So
>>>> far as I can tell, my code isn't doing that. It hangs for a while on the
>>>> call to gunzip-through-ports, long enough to uncompress the entire file,
>>>> then reads are pretty quick afterwords.
>>>>
>>>> Here's what I have thus far:
>>>>
>>>> #lang racket
>>>>
>>>> (require file/gunzip)
>>>>
>>>> (define-values (pipe-from pipe-to) (make-pipe))
>>>> (with-input-from-file "test.rkt.gz"
>>>>    (lambda ()
>>>>      (gunzip-through-ports (current-input-port) pipe-to)
>>>>      (for ([line (in-lines pipe-from)])
>>>>        (displayln line))))
>>>>
>>>
>>> You should probably 1) limit the size of the pipe (to stop it from
>>> inflating the whole file at once) and 2) put the gunzip-through-ports call
>>> in a separate thread. The gunzip thread will block when the pipe is full;
>>> when your program reads some data out of the pipe, the gunzip thread will
>>> be able to make some more progress. Something like this:
>>>
>>> (define-values (pipe-from pipe-to) (make-pipe 4000))
>>> (with-input-from-file "test.rkt.gz"
>>>   (lambda ()
>>>     (thread
>>>
>>>       (lambda ()
>>>         (gunzip-through-ports (current-input-port) pipe-to)
>>>         (close-output-port pipe-to)))
>>>
>>>     (for ([line (in-lines pipe-from)])
>>>       (displayln line))))
>>>
>>>  As an additional problem, that code doesn't actually work.
>>>> in-lines seems to be waiting for an eof-object? that
>>>> gunzip-through-ports isn't sending. Am I missing something? It ends up
>>>> just hanging after reading and printing the file.
>>>>
>>>
>>> The docs don't say anything about closing the port, so you'll probably
>>> have to do that yourself. In the code above, I added a call to
>>> close-output-port.
>>>
>>> Ryan
>>>
>>>
>>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130806/dd5944b4/attachment.html>

Posted on the users mailing list.