[plt-scheme] Effect of caching using hash-tables vs vectors
On Mar 19, jos koot wrote:
> Hi Eli and Jens Axel,
> Thanks for your responses. The idea of copying a vector into a
> larger one is straight forward. But when looking for the maximum
> cycle for the numbers 1 to 1e7, the size of the vector will grow
> beyond practical limits. Even when storing the cycles of odd numbers
> only, computing the maximum cycle for the numbers 1 up to but not
> including 1e7 would require a vector of length (/ (sub1
> 20114203639877) 2) = 10057101819938 = about 1e13. I managed to
> compute this by memorizing the cycles of odd numbers <= 250000001 in
> a vector and hashing the others (requiring 107176 hash-table
> entries). The vector gives a hit/lookup rate of about 56%. The
> hash-table is not very effective, for it provides a hit/lookup rate
> of about 0.3%. Indeed, without the hash, the computation takes about
> the same amount of time. It was a nice exercise.
I think that my line started by observing that you can get rid of more
than just the even numbers. Something like this:
* You test all numbers, starting from 1 and going up.
* By the time you reach N, you know that sequences that started with
1..N-1 all converged, so you don't need to test the sull Collatz
sequence that starts at N -- it's enough to stop as soon as you
reach some smaller number.
* This means that all even numbers are automatically fine, because N/2
is always smaller than N.
* But similar to that, there are other bit patterns that are always
good. For example, if your number is 4n+1, then the sequence will
be:
4n+1
3(4n+1)+1 = 12n+4
6n+2
3n+1
which is smaller than 4n+1
* This means that a binary suffix of `0' and `01' make the number
safe, so looking at the last two binary digits you need to test only
`11'. When you continue going up, you get more than one
alternative, obviously.
* At some point I saw that there was a connection between safe bit
suffixes and collatz sequences -- so I wrote a program that for each
suffix computes the next set of suffixes to try, verify them using a
generated C program, and the resulting process is a list of bigger
suffixes to try. The limit I reached was due to the size of C
integers.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!