[plt-scheme] PLT with very large datasets?
Yavuz Arkun wrote:
>
> If I cannot work in RAM, is there a way to memory map the data
> efficiently, using standard facilities of PLT Scheme?
Sounds like working in RAM is not a good idea -- all you need is for
netflix to stay around a little longer and get some more customers,
and you end up with twice the space and more. A memory-mapping might
be difficult to do, and I don't have an experience with that -- but
I'm not sure it will be the right thing either. Depending on your
algorithm, you might want to have better control of what stays in
memory for a particular chunk of code.
So I think that your best bet is to do the usual work: read the data
and save it in an indexed file, then create a wrapper that lets you
deal with persistent values from the file that handles the reading and
caching automatically and efficiently. It sounds like you only need
to read entries -- and if they're fixed length then you don't even
need to deal with indexing.
On Jun 29, Chongkai Zhu wrote:
> My 2 cents:
>
> 1. PLT's GC start to work if you use half of the memory. So if you
> actual data consumes 1G, you need at least 2G of memory
>
> 2. you said "in theory, about 9 bytes per triplet", but Scheme is a
> dynamic typed language, which means some memory is used as type tag.
One way to cut down to the minimum memory needed is to use a
heterogeneous vector (either through srfi-4 or the foreign
interface). This means that only the numbers are stored, which should
be easy with some wrapper functions to make it look like a vector of
triplets.
Yet another option which is similar to this is to just use one big
byte string holding the file data, and use
`floating-point-bytes->real' to read numbers from specific positions.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!