[plt-scheme] Rebuilding Large Files from Small Pieces

From: Synx (plt at synx.us.to)
Date: Tue Feb 23 02:51:45 EST 2010

I don't have a question here, just describing an interesting way I've
been working on learning, with which to store and retrieve large files
from a database of values with a small maximum fixed size.

Let's say I have a simple database of key:value pairs thus:

1 : this
2 :  is
3 : a te
4 : st o
5 : f th
6 : e fo
7 : rum
8 : 1234
9 : 567
10: .389

Let's say each row is a separate file, by the name of the key,
containing a serialized form of the value. Each file can be no more than
5 bytes long. What I want to do is print out the phrase "this is a test
of the forum", but I only start with the number 10.

Starting with 10, that means I open a port to file #10. I can see it
starts with a dot. I use that to indicate that it's the top of a hash
tree. The next number, 3, indicates the depth of the hash tree. Then I
read the number 8, and open a port to file #8. I can see it starts with
a dot too, so I read the number 1, and open a port to file #1. I still
have data to read from #8 and #10, so their ports are still open. (Or
should I just read all the data from #10 and iterate through it?)

File #1 I can assume is not a hash-list because it's on the 3rd level of
recursion. I can assume it's part of the actual data I'm looking for. So
I read File #1, save its contents to an output port. So far I have

> This

Next from file #8 I read the number 2. Then I add on " is " for:

> This is

I continue through 3 and 4, ending up with

> This is a test o

and then I'm done with #8. Closing its port I return to the iterating
through #10's values. I read a #9 from #10, and open a port to file #9.
>From there I can read the numbers 5, 6, and 7, also outputting their
contents in sequence. Now I have

> This is a test of the forum

successfully. Finished with #7, I return to the context of #9. Finished
with #9 I return to #10. Finished with #10, I return to wherever this
routine was invoked.

I'd make a scheme example of this, but I can't figure out how to save a
file to some output method, which might include decoding it, displaying
it to a screen, playing it over loudspeakers, or who knows what. I also
can't figure out whether to read the key values in a hash-list file all
at once, closing the file, or one at a time, leaving the file open. I'm
not sure which would be more resource intensive. I suspect leaving the
files open would be worse for both memory and performance, but I can't
quite justify why it is incorrect to do so.


Posted on the users mailing list.