[racket] Cleaner way to work with gzipped data?
I've never actually used dynamic-wind, although it does look interesting /
like what I need. A few questions / caveats though:
- Should the pipe be created in the pre-thunk or before the
dynamic-windentirely? The thunks don't seem to share scope, so I'm
guessing the latter,
but that seems a bit odd. I'm guessing the pre-thunk is for an entirely
different use case though when you are actually dealing with closing and
reopening resources are the like as control gets passed around.
- Doesn't dynamic-wind break if the user messes with continuations during
the value-thunk? So far as I understand, when control passes out,
post-thunkis called and then
pre-thunk on the way back in, but that means that when control returns the
port will be closed. I don't know how often this will come up, but it seems
to break if I nest a thread inside of the with-gzip call. Granted, my
version did as well because of the close-input-port call. Is this just
expected behavior?
(And yes, it works fine in the more likely / sensible case of wrapping the
entire with-gzip in a thread in both cases.)
- So far as error rather than raise, raise was my original guess. But that
added another layer of indirection to the stack trace which I didn't at
first notice (I thought I wasn't even catching the error). It makes sense
to have that though in the long run.
That all being said, how does this version look?
(define (with-gunzip thunk)
(define-values (pipe-from pipe-to) (make-pipe))
(dynamic-wind
void
(λ ()
(gunzip-through-ports (current-input-port) pipe-to)
(close-output-port pipe-to)
(parameterize ([current-input-port pipe-from])
(thunk)))
(λ ()
(unless (port-closed? pipe-to) (close-output-port pipe-to))
(unless (port-closed? pipe-from) (close-input-port pipe-from)))))
On Tue, Aug 6, 2013 at 11:47 AM, Robby Findler
<robby at eecs.northwestern.edu>wrote:
> You might consider using dynamic-wind instead of that with-handlers. Or,
> instead of (error 'with-gunzip ...) just do (raise exn). That way you won't
> lose the stack information in the original exception (which is likely the
> one a user would want).
>
> Robby
>
>
> On Tue, Aug 6, 2013 at 10:40 AM, JP Verkamp <racket at jverkamp.com> wrote:
>
>> Figured it out and cleaned it up. It turns out that I was using
>> with-handlers oddly, but reading further though the documentation it
>> works as expected. Here's a new version (generalized to any input-port):
>>
>> (define (with-gunzip thunk)
>> (define-values (pipe-from pipe-to) (make-pipe))
>> (with-handlers ([exn:fail?
>> (λ (err)
>> (close-output-port pipe-to)
>> (close-input-port pipe-from)
>> (error 'with-gunzip (exn-message err)))])
>> (gunzip-through-ports (current-input-port) pipe-to)
>> (close-output-port pipe-to)
>> (parameterize ([current-input-port pipe-from])
>> (thunk))
>> (close-input-port pipe-from)))
>>
>> If anyone's interested in a more in depth write up / source code for this
>> and with-gzip:
>> - writeup: http://blog.jverkamp.com/2013/08/06/adventures-in-racket-gzip/
>> - source:
>> https://github.com/jpverkamp/small-projects/tree/master/blog/with-gzip.rkt
>>
>>
>> On Mon, Aug 5, 2013 at 5:36 PM, JP Verkamp <racket at jverkamp.com> wrote:
>>
>>> Thanks! make-pipe isn't something that I've had to use otherwise, so I
>>> missed the optional parameter. That does certainly seem to help.
>>>
>>> Here's my first take of with-input-from-gzipped-file:
>>>
>>> (define (with-input-from-gzipped-file filename thunk #:buffer-size
>>> [buffer-size #f])
>>> (call-with-input-file filename
>>> (lambda (file-from)
>>> (define-values (pipe-from pipe-to) (make-pipe buffer-size))
>>>
>>> (thread
>>> (λ ()
>>> (gunzip-through-ports file-from pipe-to)
>>> (close-output-port pipe-to)))
>>>
>>> (current-input-port pipe-from)
>>> (thunk)
>>> (close-input-port pipe-from))))
>>>
>>> The main thing missing is that there's no error handling (where the pipe
>>> should still be closed). At the very least, if I try to call this on a
>>> non-gzipped file, it breaks on the gunzip-through-ports line.
>>> Theoretically, some variation of with-handlers should work (errorshould raise an
>>> exn:fail?, yes?), but it doesn't seem to be helping.
>>>
>>> Any help with that?
>>>
>>> Alternatively, I've now found this:
>>> http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard
>>>
>>> It seems to do exactly what I need, albeit without the call-with-*
>>> forms, but that's easy enough to wrap. With some very basic testing, it
>>> does seem to be buffering though, although it is a bit slower than the
>>> above. Not enough to cause trouble though.
>>>
>>>
>>> On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <ryanc at ccs.neu.edu>wrote:
>>>
>>>> On 08/05/2013 04:29 PM, JP Verkamp wrote:
>>>>
>>>>> Is there a nice / idiomatic way to work with gzipped data in a
>>>>> streaming
>>>>> manner (to avoid loading the rather large files into memory at once).
>>>>> So
>>>>> far as I can tell, my code isn't doing that. It hangs for a while on
>>>>> the
>>>>> call to gunzip-through-ports, long enough to uncompress the entire
>>>>> file,
>>>>> then reads are pretty quick afterwords.
>>>>>
>>>>> Here's what I have thus far:
>>>>>
>>>>> #lang racket
>>>>>
>>>>> (require file/gunzip)
>>>>>
>>>>> (define-values (pipe-from pipe-to) (make-pipe))
>>>>> (with-input-from-file "test.rkt.gz"
>>>>> (lambda ()
>>>>> (gunzip-through-ports (current-input-port) pipe-to)
>>>>> (for ([line (in-lines pipe-from)])
>>>>> (displayln line))))
>>>>>
>>>>
>>>> You should probably 1) limit the size of the pipe (to stop it from
>>>> inflating the whole file at once) and 2) put the gunzip-through-ports call
>>>> in a separate thread. The gunzip thread will block when the pipe is full;
>>>> when your program reads some data out of the pipe, the gunzip thread will
>>>> be able to make some more progress. Something like this:
>>>>
>>>> (define-values (pipe-from pipe-to) (make-pipe 4000))
>>>> (with-input-from-file "test.rkt.gz"
>>>> (lambda ()
>>>> (thread
>>>>
>>>> (lambda ()
>>>> (gunzip-through-ports (current-input-port) pipe-to)
>>>> (close-output-port pipe-to)))
>>>>
>>>> (for ([line (in-lines pipe-from)])
>>>> (displayln line))))
>>>>
>>>> As an additional problem, that code doesn't actually work.
>>>>> in-lines seems to be waiting for an eof-object? that
>>>>> gunzip-through-ports isn't sending. Am I missing something? It ends up
>>>>> just hanging after reading and printing the file.
>>>>>
>>>>
>>>> The docs don't say anything about closing the port, so you'll probably
>>>> have to do that yourself. In the code above, I added a call to
>>>> close-output-port.
>>>>
>>>> Ryan
>>>>
>>>>
>>>
>>
>> ____________________
>> Racket Users list:
>> http://lists.racket-lang.org/users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130806/764bdf67/attachment.html>