[plt-scheme] Effect of caching using hash-tables vs vectors

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Mar 19 08:38:25 EDT 2007

On Mar 19, jos koot wrote:
> Hi Eli and Jens Axel,
> Thanks for your responses. The idea of copying a vector into a
> larger one is straight forward. But when looking for the maximum
> cycle for the numbers 1 to 1e7, the size of the vector will grow
> beyond practical limits. Even when storing the cycles of odd numbers
> only, computing the maximum cycle for the numbers 1 up to but not
> including 1e7 would require a vector of length (/ (sub1
> 20114203639877) 2) = 10057101819938 = about 1e13. I managed to
> compute this by memorizing the cycles of odd numbers <= 250000001 in
> a vector and hashing the others (requiring 107176 hash-table
> entries). The vector gives a hit/lookup rate of about 56%. The
> hash-table is not very effective, for it provides a hit/lookup rate
> of about 0.3%. Indeed, without the hash, the computation takes about
> the same amount of time. It was a nice exercise.

I think that my line started by observing that you can get rid of more
than just the even numbers.  Something like this:

* You test all numbers, starting from 1 and going up.

* By the time you reach N, you know that sequences that started with
  1..N-1 all converged, so you don't need to test the sull Collatz
  sequence that starts at N -- it's enough to stop as soon as you
  reach some smaller number.

* This means that all even numbers are automatically fine, because N/2
  is always smaller than N.

* But similar to that, there are other bit patterns that are always
  good.  For example, if your number is 4n+1, then the sequence will
  be:
    4n+1
    3(4n+1)+1 = 12n+4
    6n+2
    3n+1
  which is smaller than 4n+1

* This means that a binary suffix of `0' and `01' make the number
  safe, so looking at the last two binary digits you need to test only
  `11'.  When you continue going up, you get more than one
  alternative, obviously.

* At some point I saw that there was a connection between safe bit
  suffixes and collatz sequences -- so I wrote a program that for each
  suffix computes the next set of suffixes to try, verify them using a
  generated C program, and the resulting process is a list of bigger
  suffixes to try.  The limit I reached was due to the size of C
  integers.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.