[racket] Cleaner way to work with gzipped data?

From: Greg Hendershott (greghendershott at gmail.com)
Date: Tue Aug 6 14:11:00 EDT 2013

I'll let Ryan comment on dynamic-wind usage. (I'd probably just use
with-handlers and raise.)

A couple optional thoughts:

1. I notice that with-gzip and with-gunzip are identical except for
the function they call. You could parameterize that away: Replace both
with a single `with-pipe` function that takes any xxx-through-ports
style function (input-port? output-port? . -> . any) as an argument.
[1]

Because the bulk of the code is about using make-pipe with any such
"converter" function; it doesn't care which. And this pattern of using
make-pipe could easily apply to functions like
transfer-{encoding|decoding}-through-ports and
content-{encoding|decoding}-through-ports for working with HTTP, and
other transforms. Even the identity function for this, `copy-ports`.

1.5 Even if you still wanted to provide separate with-gzip and
with-gunzip functions, implementation could be thin wrappers over a
`with-pipe`.

2. There's a lot of thunking going on. On the plus side, traditional
things like with-input-from-file use thunks, so it's consistent with
that. On the other other hand a macro could simplify that so users
don't need to write (lambda () body...) or (thunk body...), they just
write body... directly.


[1]: Yeah, it's awkward that gzip-through-ports takes those 2 extra
arguments for filename and time -- it's not just (input-port?
output-port? . -> . any) like the general `make-pipe` wants. But you
could pass either of the following to `make-pipe`:
(lambda (in out) (gzip-through-ports in out #f 0))
(curryr gzip-through-ports #f 0)



On Tue, Aug 6, 2013 at 12:16 PM, JP Verkamp <racket at jverkamp.com> wrote:
> I've never actually used dynamic-wind, although it does look interesting /
> like what I need. A few questions / caveats though:
>
> - Should the pipe be created in the pre-thunk or before the dynamic-wind
> entirely? The thunks don't seem to share scope, so I'm guessing the latter,
> but that seems a bit odd. I'm guessing the pre-thunk is for an entirely
> different use case though when you are actually dealing with closing and
> reopening resources are the like as control gets passed around.
>
> - Doesn't dynamic-wind break if the user messes with continuations during
> the value-thunk? So far as I understand, when control passes out, post-thunk
> is called and then pre-thunk on the way back in, but that means that when
> control returns the port will be closed. I don't know how often this will
> come up, but it seems to break if I nest a thread inside of the with-gzip
> call. Granted, my version did as well because of the close-input-port call.
> Is this just expected behavior?
>
> (And yes, it works fine in the more likely / sensible case of wrapping the
> entire with-gzip in a thread in both cases.)
>
> - So far as error rather than raise, raise was my original guess. But that
> added another layer of indirection to the stack trace which I didn't at
> first notice (I thought I wasn't even catching the error). It makes sense to
> have that though in the long run.
>
> That all being said, how does this version look?
>
> (define (with-gunzip thunk)
>   (define-values (pipe-from pipe-to) (make-pipe))
>   (dynamic-wind
>    void
>    (λ ()
>      (gunzip-through-ports (current-input-port) pipe-to)
>      (close-output-port pipe-to)
>      (parameterize ([current-input-port pipe-from])
>        (thunk)))
>    (λ ()
>      (unless (port-closed? pipe-to) (close-output-port pipe-to))
>      (unless (port-closed? pipe-from) (close-input-port pipe-from)))))
>
>
> On Tue, Aug 6, 2013 at 11:47 AM, Robby Findler <robby at eecs.northwestern.edu>
> wrote:
>>
>> You might consider using dynamic-wind instead of that with-handlers. Or,
>> instead of (error 'with-gunzip ...) just do (raise exn). That way you won't
>> lose the stack information in the original exception (which is likely the
>> one a user would want).
>>
>> Robby
>>
>>
>> On Tue, Aug 6, 2013 at 10:40 AM, JP Verkamp <racket at jverkamp.com> wrote:
>>>
>>> Figured it out and cleaned it up. It turns out that I was using
>>> with-handlers oddly, but reading further though the documentation it works
>>> as expected. Here's a new version (generalized to any input-port):
>>>
>>> (define (with-gunzip thunk)
>>>   (define-values (pipe-from pipe-to) (make-pipe))
>>>   (with-handlers ([exn:fail?
>>>                    (λ (err)
>>>                      (close-output-port pipe-to)
>>>                      (close-input-port pipe-from)
>>>                      (error 'with-gunzip (exn-message err)))])
>>>     (gunzip-through-ports (current-input-port) pipe-to)
>>>     (close-output-port pipe-to)
>>>     (parameterize ([current-input-port pipe-from])
>>>       (thunk))
>>>     (close-input-port pipe-from)))
>>>
>>> If anyone's interested in a more in depth write up / source code for this
>>> and with-gzip:
>>> - writeup: http://blog.jverkamp.com/2013/08/06/adventures-in-racket-gzip/
>>> - source:
>>> https://github.com/jpverkamp/small-projects/tree/master/blog/with-gzip.rkt
>>>
>>>
>>> On Mon, Aug 5, 2013 at 5:36 PM, JP Verkamp <racket at jverkamp.com> wrote:
>>>>
>>>> Thanks! make-pipe isn't something that I've had to use otherwise, so I
>>>> missed the optional parameter. That does certainly seem to help.
>>>>
>>>> Here's my first take of with-input-from-gzipped-file:
>>>>
>>>> (define (with-input-from-gzipped-file filename thunk #:buffer-size
>>>> [buffer-size #f])
>>>>   (call-with-input-file filename
>>>>     (lambda (file-from)
>>>>       (define-values (pipe-from pipe-to) (make-pipe buffer-size))
>>>>
>>>>       (thread
>>>>           (λ ()
>>>>             (gunzip-through-ports file-from pipe-to)
>>>>             (close-output-port pipe-to)))
>>>>
>>>>       (current-input-port pipe-from)
>>>>       (thunk)
>>>>       (close-input-port pipe-from))))
>>>>
>>>> The main thing missing is that there's no error handling (where the pipe
>>>> should still be closed). At the very least, if I try to call this on a
>>>> non-gzipped file, it breaks on the gunzip-through-ports line. Theoretically,
>>>> some variation of with-handlers should work (error should raise an
>>>> exn:fail?, yes?), but it doesn't seem to be helping.
>>>>
>>>> Any help with that?
>>>>
>>>> Alternatively, I've now found this:
>>>> http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard
>>>>
>>>> It seems to do exactly what I need, albeit without the call-with-*
>>>> forms, but that's easy enough to wrap. With some very basic testing, it does
>>>> seem to be buffering though, although it is a bit slower than the above. Not
>>>> enough to cause trouble though.
>>>>
>>>>
>>>> On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <ryanc at ccs.neu.edu>
>>>> wrote:
>>>>>
>>>>> On 08/05/2013 04:29 PM, JP Verkamp wrote:
>>>>>>
>>>>>> Is there a nice / idiomatic way to work with gzipped data in a
>>>>>> streaming
>>>>>> manner (to avoid loading the rather large files into memory at once).
>>>>>> So
>>>>>> far as I can tell, my code isn't doing that. It hangs for a while on
>>>>>> the
>>>>>> call to gunzip-through-ports, long enough to uncompress the entire
>>>>>> file,
>>>>>> then reads are pretty quick afterwords.
>>>>>>
>>>>>> Here's what I have thus far:
>>>>>>
>>>>>> #lang racket
>>>>>>
>>>>>> (require file/gunzip)
>>>>>>
>>>>>> (define-values (pipe-from pipe-to) (make-pipe))
>>>>>> (with-input-from-file "test.rkt.gz"
>>>>>>    (lambda ()
>>>>>>      (gunzip-through-ports (current-input-port) pipe-to)
>>>>>>      (for ([line (in-lines pipe-from)])
>>>>>>        (displayln line))))
>>>>>
>>>>>
>>>>> You should probably 1) limit the size of the pipe (to stop it from
>>>>> inflating the whole file at once) and 2) put the gunzip-through-ports call
>>>>> in a separate thread. The gunzip thread will block when the pipe is full;
>>>>> when your program reads some data out of the pipe, the gunzip thread will be
>>>>> able to make some more progress. Something like this:
>>>>>
>>>>> (define-values (pipe-from pipe-to) (make-pipe 4000))
>>>>> (with-input-from-file "test.rkt.gz"
>>>>>   (lambda ()
>>>>>     (thread
>>>>>
>>>>>       (lambda ()
>>>>>         (gunzip-through-ports (current-input-port) pipe-to)
>>>>>         (close-output-port pipe-to)))
>>>>>
>>>>>     (for ([line (in-lines pipe-from)])
>>>>>       (displayln line))))
>>>>>
>>>>>> As an additional problem, that code doesn't actually work.
>>>>>> in-lines seems to be waiting for an eof-object? that
>>>>>> gunzip-through-ports isn't sending. Am I missing something? It ends up
>>>>>> just hanging after reading and printing the file.
>>>>>
>>>>>
>>>>> The docs don't say anything about closing the port, so you'll probably
>>>>> have to do that yourself. In the code above, I added a call to
>>>>> close-output-port.
>>>>>
>>>>> Ryan
>>>>>
>>>>
>>>
>>>
>>> ____________________
>>>   Racket Users list:
>>>   http://lists.racket-lang.org/users
>>>
>>
>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>


Posted on the users mailing list.