[plt-scheme] Effect of caching using hash-tables vs vectors

From: jos koot (jos.koot at telefonica.net)
Date: Mon Mar 19 09:32:27 EDT 2007

Thanks,
Most interesting. I am not familiar with the sull Collatz sequence, but you did provide a clear answer and a nice pointer to chase.
Jos Koot

(((((lambda(x)((((((((x x)x)x)x)x)x)x)x))
    (lambda(x)(lambda(y)(x(x y)))))
   (lambda(x)(x)x))
  (lambda()(printf "Greetings, Jos~n"))))
  ----- Original Message ----- 
  From: Eli Barzilay 
  To: jos koot 
  Cc: Jens Axel Søgaard ; Daniel Yoo ; plt-scheme at list.cs.brown.edu 
  Sent: Monday, March 19, 2007 1:38 PM
  Subject: Re: [plt-scheme] Effect of caching using hash-tables vs vectors


  On Mar 19, jos koot wrote:
  > Hi Eli and Jens Axel,
  > Thanks for your responses. The idea of copying a vector into a
  > larger one is straight forward. But when looking for the maximum
  > cycle for the numbers 1 to 1e7, the size of the vector will grow
  > beyond practical limits. Even when storing the cycles of odd numbers
  > only, computing the maximum cycle for the numbers 1 up to but not
  > including 1e7 would require a vector of length (/ (sub1
  > 20114203639877) 2) = 10057101819938 = about 1e13. I managed to
  > compute this by memorizing the cycles of odd numbers <= 250000001 in
  > a vector and hashing the others (requiring 107176 hash-table
  > entries). The vector gives a hit/lookup rate of about 56%. The
  > hash-table is not very effective, for it provides a hit/lookup rate
  > of about 0.3%. Indeed, without the hash, the computation takes about
  > the same amount of time. It was a nice exercise.

  I think that my line started by observing that you can get rid of more
  than just the even numbers.  Something like this:

  * You test all numbers, starting from 1 and going up.

  * By the time you reach N, you know that sequences that started with
    1..N-1 all converged, so you don't need to test the sull Collatz
    sequence that starts at N -- it's enough to stop as soon as you
    reach some smaller number.

  * This means that all even numbers are automatically fine, because N/2
    is always smaller than N.

  * But similar to that, there are other bit patterns that are always
    good.  For example, if your number is 4n+1, then the sequence will
    be:
      4n+1
      3(4n+1)+1 = 12n+4
      6n+2
      3n+1
    which is smaller than 4n+1

  * This means that a binary suffix of `0' and `01' make the number
    safe, so looking at the last two binary digits you need to test only
    `11'.  When you continue going up, you get more than one
    alternative, obviously.

  * At some point I saw that there was a connection between safe bit
    suffixes and collatz sequences -- so I wrote a program that for each
    suffix computes the next set of suffixes to try, verify them using a
    generated C program, and the resulting process is a list of bigger
    suffixes to try.  The limit I reached was due to the size of C
    integers.

  -- 
            ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://www.barzilay.org/                 Maze is Life!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20070319/cdfad29e/attachment.html>

Posted on the users mailing list.