<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hello,<br>
<br>
I'm trying to understand how memory is allocated and collected when
working with streams.<br>
I recently asked a question about how to limit memory when using
streams on Stackoverflow<br>
and got two good answers:<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a
href="http://stackoverflow.com/questions/18629188/how-to-limit-memory-use-when-using-a-stream">http://stackoverflow.com/questions/18629188/how-to-limit-memory-use-when-using-a-stream</a><br>
<br>
However, I'm seeking a better understanding than could really be
given through the SO format.<br>
I want to use streams because I have too much data to fit in memory
and hence want to use them<br>
to bring in the data from files and databases sequentially as
needed. However, I'm finding that<br>
the GC is not collecting as I would have hoped and hence streams are
not quite as straightforward<br>
a solution as I expected. The sort of problems that I am
experiencing are demonstrated with the<br>
following code<br>
<br>
#lang racket<br>
(require rackunit)<br>
<br>
; This program fails with out of memory errors when memory limit
set to 128mb<br>
; It always fails when it comes to testing filtered-nums,
regardless of how test-nums?<br>
; and test-gen-nums? have been set. However
test-for/sum-gen-filtered-nums?<br>
; also fails if set.<br>
<br>
(define max-num 10000000)<br>
(define test-nums? #f)<br>
(define test-gen-filtered-nums? #f)<br>
(define test-for/sum-gen-filtered-nums? #f)<br>
<br>
(define nums (in-range max-num))<br>
(define filtered-nums<br>
(stream-filter (λ (i) (values #t)) nums))<br>
<br>
(define (gen-filtered-nums)<br>
(stream-filter (λ (i) (values #t)) nums))<br>
<br>
(when test-nums?<br>
(displayln "Testing nums")<br>
(check-equal? max-num (stream-length nums)))<br>
<br>
(when test-gen-filtered-nums?<br>
(displayln "Testing gen-filtered-nums")<br>
(check-equal? max-num (stream-length (gen-filtered-nums))))<br>
<br>
(when test-for/sum-gen-filtered-nums?<br>
(displayln "Testing with for/sum-gen-filtered-nums ")<br>
(check-equal? max-num (for/sum ([i (gen-filtered-nums)]) 1)))<br>
<br>
<br>
(displayln "Testing filtered-nums")<br>
(check-equal? max-num (stream-length filtered-nums))<br>
<br>
<br>
I understand that making multiple passes through a big data is
inefficient,<br>
but here I am trying to gain a better understanding of the GC. So
this leads<br>
me to a few related questions:<br>
<br>
i. Why does the GC seem to collect more effectively when the
stream is<br>
created in a function as opposed to in a straight definition?
i.e<br>
test-gen-filtered-nums? passes, although I note that<br>
test-for/sum-gen-filtered-nums? doesn't.<br>
ii. Is stream-filter inappropriate to use with big data sets?<br>
iii. Is there a better choice than streams for dealing with big
data sets, coming from<br>
disparate sources such as files, databases, etc, within
racket?<br>
<br>
<br>
Thanks<br>
<br>
<br>
<br>
Lorry<br>
<br>
<br>
<pre class="moz-signature" cols="72">--
vLife Systems Ltd
Registered Office: The Meridian, 4 Copthall House, Station Square, Coventry, CV1 2FL
Registered in England and Wales No. 06477649
<a class="moz-txt-link-freetext" href="http://vlifesystems.com">http://vlifesystems.com</a>
</pre>
</body>
</html>