[racket] Cleaner way to work with gzipped data?

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Fri Aug 9 15:57:37 EDT 2013

Oh, yes. Continuation barriers are a better way to do basically what I
wrote. Sorry, I should have mentioned that; I had forgotten about them.

Robby


On Fri, Aug 9, 2013 at 1:05 PM, David Vanderson
<david.vanderson at gmail.com>wrote:

>  Is this the sort of situation that continuation barriers were made for?
> Do you have any guidance about using them?
>
>
> #lang racket
>
> (define (only-once thunk)
>   (dynamic-wind
>    (λ () (displayln "pre-thunk"))
>    (λ () (call-with-continuation-barrier thunk))
>    (λ () (displayln "post-thunk"))))
>
> (only-once (λ () (displayln "hi")))
>
>
> (let ([saved-k #f])
>   (only-once (λ () (let/cc k (set! saved-k k)
>                      (displayln "saving continuation"))))
>   (displayln "invoking continuation...")
>   (saved-k 11))
>
> Thanks,
> Dave
>
>
> On 08/08/2013 12:10 PM, Robby Findler wrote:
>
> As to the interactions with dynamic-wind and continuations: l did, as you
> figured out, intend for you to use only the post-thunk in the dynamic-wind
> (to close the pipes). In principle, you could use the pre-thurnk to try to
> restore the pipe, but there really isn't enough information to do this
> correctly in all cases.
>
>  I see that you've protected things a little bit by using the
> port-closed? predicate as a guard, but if you did that to protect against
> possible continuation re-entry, then probably you're better off adding
> something to the pre-thunk that explicitly raises an error saying that it
> isn't allowed to re-enter. Something like this:
>
>  #lang racket
>
>  (define (only-once thunk)
>   (define already-in-once? #f)
>   (dynamic-wind
>    (λ ()
>      (when already-in-once? (error 'only-once "no no")))
>    (λ ()
>      (set! already-in-once? #t)
>      (thunk))
>    void))
>
>  (only-once (λ () "hi"))
>
>  (let ([saved-k #f])
>   (only-once (λ () (let/cc k (set! saved-k k))))
>   (saved-k 11))
>
>
>
>
> On Tue, Aug 6, 2013 at 11:16 AM, JP Verkamp <racket at jverkamp.com> wrote:
>
>> I've never actually used dynamic-wind, although it does look interesting
>> / like what I need. A few questions / caveats though:
>>
>>  - Should the pipe be created in the pre-thunk or before the dynamic-windentirely? The thunks don't seem to share scope, so I'm guessing the latter,
>> but that seems a bit odd. I'm guessing the pre-thunk is for an entirely
>> different use case though when you are actually dealing with closing and
>> reopening resources are the like as control gets passed around.
>>
>> - Doesn't dynamic-wind break if the user messes with continuations
>> during the value-thunk? So far as I understand, when control passes out,
>> post-thunk is called and then pre-thunk on the way back in, but that
>> means that when control returns the port will be closed. I don't know how
>> often this will come up, but it seems to break if I nest a thread inside of
>> the with-gzip call. Granted, my version did as well because of the
>> close-input-port call. Is this just expected behavior?
>>
>>  (And yes, it works fine in the more likely / sensible case of wrapping
>> the entire with-gzip in a thread in both cases.)
>>
>>  - So far as error rather than raise, raise was my original guess. But
>> that added another layer of indirection to the stack trace which I didn't
>> at first notice (I thought I wasn't even catching the error). It makes
>> sense to have that though in the long run.
>>
>>  That all being said, how does this version look?
>>
>>   (define (with-gunzip thunk)
>>   (define-values (pipe-from pipe-to) (make-pipe))
>>    (dynamic-wind
>>    void
>>    (λ ()
>>      (gunzip-through-ports (current-input-port) pipe-to)
>>       (close-output-port pipe-to)
>>      (parameterize ([current-input-port pipe-from])
>>         (thunk)))
>>    (λ ()
>>      (unless (port-closed? pipe-to) (close-output-port pipe-to))
>>      (unless (port-closed? pipe-from) (close-input-port pipe-from)))))
>>
>>
>> On Tue, Aug 6, 2013 at 11:47 AM, Robby Findler <
>> robby at eecs.northwestern.edu> wrote:
>>
>>> You might consider using dynamic-wind instead of that with-handlers. Or,
>>> instead of (error 'with-gunzip ...) just do (raise exn). That way you won't
>>> lose the stack information in the original exception (which is likely the
>>> one a user would want).
>>>
>>> Robby
>>>
>>>
>>>  On Tue, Aug 6, 2013 at 10:40 AM, JP Verkamp <racket at jverkamp.com>wrote:
>>>
>>>>  Figured it out and cleaned it up. It turns out that I was using
>>>> with-handlers oddly, but reading further though the documentation it
>>>> works as expected. Here's a new version (generalized to any input-port):
>>>>
>>>>  (define (with-gunzip thunk)
>>>>   (define-values (pipe-from pipe-to) (make-pipe))
>>>>   (with-handlers ([exn:fail?
>>>>                    (λ (err)
>>>>                      (close-output-port pipe-to)
>>>>                      (close-input-port pipe-from)
>>>>                      (error 'with-gunzip (exn-message err)))])
>>>>     (gunzip-through-ports (current-input-port) pipe-to)
>>>>     (close-output-port pipe-to)
>>>>     (parameterize ([current-input-port pipe-from])
>>>>       (thunk))
>>>>     (close-input-port pipe-from)))
>>>>
>>>>  If anyone's interested in a more in depth write up / source code for
>>>> this and with-gzip:
>>>> - writeup:
>>>> http://blog.jverkamp.com/2013/08/06/adventures-in-racket-gzip/
>>>> - source:
>>>> https://github.com/jpverkamp/small-projects/tree/master/blog/with-gzip.rkt
>>>>
>>>>
>>>> On Mon, Aug 5, 2013 at 5:36 PM, JP Verkamp <racket at jverkamp.com> wrote:
>>>>
>>>>> Thanks! make-pipe isn't something that I've had to use otherwise, so
>>>>> I missed the optional parameter. That does certainly seem to help.
>>>>>
>>>>>  Here's my first take of with-input-from-gzipped-file:
>>>>>
>>>>>   (define (with-input-from-gzipped-file filename thunk #:buffer-size
>>>>> [buffer-size #f])
>>>>>   (call-with-input-file filename
>>>>>     (lambda (file-from)
>>>>>       (define-values (pipe-from pipe-to) (make-pipe buffer-size))
>>>>>
>>>>>       (thread
>>>>>           (λ ()
>>>>>             (gunzip-through-ports file-from pipe-to)
>>>>>             (close-output-port pipe-to)))
>>>>>
>>>>>       (current-input-port pipe-from)
>>>>>       (thunk)
>>>>>       (close-input-port pipe-from))))
>>>>>
>>>>>  The main thing missing is that there's no error handling (where the
>>>>> pipe should still be closed). At the very least, if I try to call this on a
>>>>> non-gzipped file, it breaks on the gunzip-through-ports line.
>>>>> Theoretically, some variation of with-handlers should work (errorshould raise an
>>>>> exn:fail?, yes?), but it doesn't seem to be helping.
>>>>>
>>>>>  Any help with that?
>>>>>
>>>>>  Alternatively, I've now found this:
>>>>> http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard
>>>>>
>>>>>  It seems to do exactly what I need, albeit without the call-with-*
>>>>> forms, but that's easy enough to wrap. With some very basic testing, it
>>>>> does seem to be buffering though, although it is a bit slower than the
>>>>> above. Not enough to cause trouble though.
>>>>>
>>>>>
>>>>> On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <ryanc at ccs.neu.edu>wrote:
>>>>>
>>>>>> On 08/05/2013 04:29 PM, JP Verkamp wrote:
>>>>>>
>>>>>>> Is there a nice / idiomatic way to work with gzipped data in a
>>>>>>> streaming
>>>>>>> manner (to avoid loading the rather large files into memory at
>>>>>>> once). So
>>>>>>> far as I can tell, my code isn't doing that. It hangs for a while on
>>>>>>> the
>>>>>>> call to gunzip-through-ports, long enough to uncompress the entire
>>>>>>> file,
>>>>>>> then reads are pretty quick afterwords.
>>>>>>>
>>>>>>> Here's what I have thus far:
>>>>>>>
>>>>>>> #lang racket
>>>>>>>
>>>>>>> (require file/gunzip)
>>>>>>>
>>>>>>> (define-values (pipe-from pipe-to) (make-pipe))
>>>>>>> (with-input-from-file "test.rkt.gz"
>>>>>>>    (lambda ()
>>>>>>>      (gunzip-through-ports (current-input-port) pipe-to)
>>>>>>>      (for ([line (in-lines pipe-from)])
>>>>>>>        (displayln line))))
>>>>>>>
>>>>>>
>>>>>>  You should probably 1) limit the size of the pipe (to stop it from
>>>>>> inflating the whole file at once) and 2) put the gunzip-through-ports call
>>>>>> in a separate thread. The gunzip thread will block when the pipe is full;
>>>>>> when your program reads some data out of the pipe, the gunzip thread will
>>>>>> be able to make some more progress. Something like this:
>>>>>>
>>>>>> (define-values (pipe-from pipe-to) (make-pipe 4000))
>>>>>> (with-input-from-file "test.rkt.gz"
>>>>>>   (lambda ()
>>>>>>     (thread
>>>>>>
>>>>>>       (lambda ()
>>>>>>         (gunzip-through-ports (current-input-port) pipe-to)
>>>>>>          (close-output-port pipe-to)))
>>>>>>
>>>>>>     (for ([line (in-lines pipe-from)])
>>>>>>       (displayln line))))
>>>>>>
>>>>>>  As an additional problem, that code doesn't actually work.
>>>>>>> in-lines seems to be waiting for an eof-object? that
>>>>>>> gunzip-through-ports isn't sending. Am I missing something? It ends
>>>>>>> up
>>>>>>> just hanging after reading and printing the file.
>>>>>>>
>>>>>>
>>>>>>  The docs don't say anything about closing the port, so you'll
>>>>>> probably have to do that yourself. In the code above, I added a call to
>>>>>> close-output-port.
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>  ____________________
>>>>   Racket Users list:
>>>>   http://lists.racket-lang.org/users
>>>>
>>>>
>>>
>>
>> ____________________
>>   Racket Users list:
>>   http://lists.racket-lang.org/users
>>
>>
>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130809/7d3bed28/attachment-0001.html>

Posted on the users mailing list.