[plt-scheme] Effect of caching using hash-tables vs vectors

Mon Mar 19 09:19:58 EDT 2007

Sounds like you could use that kind of a strategy to go way further
than just C integers. Are C integers really the biggest numbers people
have found to converge?

Robby

On 3/19/07, Eli Barzilay <eli at barzilay.org> wrote:
> On Mar 19, jos koot wrote:
> > Hi Eli and Jens Axel,
> > Thanks for your responses. The idea of copying a vector into a
> > larger one is straight forward. But when looking for the maximum
> > cycle for the numbers 1 to 1e7, the size of the vector will grow
> > beyond practical limits. Even when storing the cycles of odd numbers
> > only, computing the maximum cycle for the numbers 1 up to but not
> > including 1e7 would require a vector of length (/ (sub1
> > 20114203639877) 2) = 10057101819938 = about 1e13. I managed to
> > compute this by memorizing the cycles of odd numbers <= 250000001 in
> > a vector and hashing the others (requiring 107176 hash-table
> > entries). The vector gives a hit/lookup rate of about 56%. The
> > hash-table is not very effective, for it provides a hit/lookup rate
> > of about 0.3%. Indeed, without the hash, the computation takes about
> > the same amount of time. It was a nice exercise.
>
> I think that my line started by observing that you can get rid of more
> than just the even numbers.  Something like this:
>
> * You test all numbers, starting from 1 and going up.
>
> * By the time you reach N, you know that sequences that started with
>   1..N-1 all converged, so you don't need to test the sull Collatz
>   sequence that starts at N -- it's enough to stop as soon as you
>   reach some smaller number.
>
> * This means that all even numbers are automatically fine, because N/2
>   is always smaller than N.
>
> * But similar to that, there are other bit patterns that are always
>   good.  For example, if your number is 4n+1, then the sequence will
>   be:
>     4n+1
>     3(4n+1)+1 = 12n+4
>     6n+2
>     3n+1
>   which is smaller than 4n+1
>
> * This means that a binary suffix of `0' and `01' make the number
>   safe, so looking at the last two binary digits you need to test only
>   `11'.  When you continue going up, you get more than one
>   alternative, obviously.
>
> * At some point I saw that there was a connection between safe bit
>   suffixes and collatz sequences -- so I wrote a program that for each
>   suffix computes the next set of suffixes to try, verify them using a
>   generated C program, and the resulting process is a list of bigger
>   suffixes to try.  The limit I reached was due to the size of C
>   integers.
>
> --
>           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                   http://www.barzilay.org/                 Maze is Life!
> _________________________________________________
>   For list-related administrative tasks:
>   http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>