[racket] Understanding GC when working with streams
Hello,
I'm trying to understand how memory is allocated and collected when
working with streams.
I recently asked a question about how to limit memory when using streams
on Stackoverflow
and got two good answers:
http://stackoverflow.com/questions/18629188/how-to-limit-memory-use-when-using-a-stream
However, I'm seeking a better understanding than could really be given
through the SO format.
I want to use streams because I have too much data to fit in memory and
hence want to use them
to bring in the data from files and databases sequentially as needed.
However, I'm finding that
the GC is not collecting as I would have hoped and hence streams are not
quite as straightforward
a solution as I expected. The sort of problems that I am experiencing
are demonstrated with the
following code
#lang racket
(require rackunit)
; This program fails with out of memory errors when memory limit set
to 128mb
; It always fails when it comes to testing filtered-nums, regardless
of how test-nums?
; and test-gen-nums? have been set. However
test-for/sum-gen-filtered-nums?
; also fails if set.
(define max-num 10000000)
(define test-nums? #f)
(define test-gen-filtered-nums? #f)
(define test-for/sum-gen-filtered-nums? #f)
(define nums (in-range max-num))
(define filtered-nums
(stream-filter (? (i) (values #t)) nums))
(define (gen-filtered-nums)
(stream-filter (? (i) (values #t)) nums))
(when test-nums?
(displayln "Testing nums")
(check-equal? max-num (stream-length nums)))
(when test-gen-filtered-nums?
(displayln "Testing gen-filtered-nums")
(check-equal? max-num (stream-length (gen-filtered-nums))))
(when test-for/sum-gen-filtered-nums?
(displayln "Testing with for/sum-gen-filtered-nums ")
(check-equal? max-num (for/sum ([i (gen-filtered-nums)]) 1)))
(displayln "Testing filtered-nums")
(check-equal? max-num (stream-length filtered-nums))
I understand that making multiple passes through a big data is inefficient,
but here I am trying to gain a better understanding of the GC. So this leads
me to a few related questions:
i. Why does the GC seem to collect more effectively when the stream is
created in a function as opposed to in a straight definition? i.e
test-gen-filtered-nums? passes, although I note that
test-for/sum-gen-filtered-nums? doesn't.
ii. Is stream-filter inappropriate to use with big data sets?
iii. Is there a better choice than streams for dealing with big data
sets, coming from
disparate sources such as files, databases, etc, within racket?
Thanks
Lorry
--
vLife Systems Ltd
Registered Office: The Meridian, 4 Copthall House, Station Square, Coventry, CV1 2FL
Registered in England and Wales No. 06477649
http://vlifesystems.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130907/17a10080/attachment.html>