[racket] random big nat, please

From: Neil Toronto (neil.toronto at gmail.com)
Date: Mon Apr 8 15:47:23 EDT 2013

Moving this to the list because it seems general-interest-y.

On 04/02/2013 11:08 AM, Robby Findler wrote:
> Hi Neil; any advice on how to pick, at random, an natural number less
> than 6047071558442633408797999878063656490199914790037659740579688354375333108700179625697869824,
> or other large integers?
> I wouldn't mind uniformly random (but if I get to make wishes, I'd like
> to avoid small numbers, large numbers, and numbers near n/2).
> I tried looking in the finite distribution and integer distribution
> sections of the math library but didn't see anything.

It's not in `math/distributions' because that module doesn't have 
integer distributions yet. (Well, it does, in the style of R, where the 
"integers" are actually flonums. They're fast, but probably not exact 
enough, or may not return random numbers big enough, for what you want 
them for.)

The random big integer functions are sort of hidden in `math/base', as 
`random-natural', `random-integer' and `random-bits'. The last is the 
fastest; use it if you want a natural number less than some power of 2.

The distributions for those are uniform, so to get your wishes, you'll 
have to be a bit tricky. Here's one way: add them. If you add some 
number p of uniformly distributed numbers, as p increases, the 
distribution of the sum approaches a bell shape, which sounds like what 
you want. For example, adding two results in a triangular distribution:

#lang racket

(require plot)

(define is (build-list 20000 (λ (_) (random 100))))
(define js (build-list 20000 (λ (_) (random 100))))
(plot (density (map + is js)))

Here's a more general function that chooses large natural numbers < n by 
adding some number of uniformly random naturals:

(require math)

(define random-natural-parts 3)

(define (my-random-natural n)
   (define p (min n random-natural-parts))
   (define m (quotient n p))
   (define m-last (- n (* m (- p 1))))
   (+ (apply + (build-list (- p 1) (λ (_) (random-natural (+ m 1)))))
      (random-natural m-last)))

(define xs
   (build-list 20000 (λ (_) (my-random-natural
(plot (density xs))
(printf "max xs = ~v~n" (apply max xs))
(printf "min xs = ~v~n" (apply min xs))

It should work for all n >= 1. Use `random-natural-parts' to control the 
shape: 1 for uniform, 2 for triangular, 3 for piecewise quadratic, etc., 
etc. The higher this number is, the less probable the numbers near 0 and 
n will be.


To probabilistically avoid some kinds of numbers, you could use 
rejection sampling. If you wanted all powers of two to be 1/4 the 
probability they normally would be, you could accept them with 
probability 1/4 like this:

(define (my-random-natural* n)
   (define x (my-random-natural n))
   (cond [(or (not (power-of-two? x)) ((random) . < . 1/4))  x]
         [else  (my-random-natural* n)]))

;; Plot a histogram
(define-values (ys cs)
       ([(ys)     (build-list 20000 (λ (_) (my-random-natural* 40)))]
        [(ys cs)  (count-samples ys)])
     (sort-samples < ys cs)))

(plot (discrete-histogram (map list ys cs)))

Here, "rejection" means trying again, since you always want an answer.

To avoid numbers *near* a power of two, make the acceptance probability 
a function of the distance to the nearest power of two. We'll need a 
"nearest power of two" function first:

(define (floor-log2 x)
   (- (integer-length x) 1))

(define (ceiling-log2 x)
   (integer-length (- x 1)))

(define (nearest-pow2 x)
   (cond [(zero? x)  1]
          (define x0 (expt 2 (floor-log2 x)))
          (define x1 (expt 2 (ceiling-log2 x)))
          (if ((abs (- x x0)) . < . (abs (- x x1))) x0 x1)]))

Then make and use a function that returns acceptance probabilities:

(define pow2-prob 1/4)
(define pow2-spread 4)

(define (accept-prob x)
   (define pow2 (nearest-pow2 x))
   (define d (min 1 (/ (abs (- x pow2)) pow2-spread)))
   ;; A convex combination of 1 and pow2-prob, with fraction d
   (+ d (* (- 1 d) pow2-prob)))

(define (my-random-natural* n)
   (define x (my-random-natural n))
   (cond [((random) . < . (accept-prob x))  x]
         [else  (my-random-natural* n)]))

Here, `pow2-prob' is the probability a power of two is accepted, and 
`pow2-spread' is the minimum distance an x can be from a power of two to 
always be accepted.

 > (map accept-prob '(16 17 18 19 20 21))
'(1/4 7/16 5/8 13/16 1 1)

The `accept-prob' function can be anything, as long as it returns 
probabilities, and doesn't return small probabilities so often that 
`my-random-natural*' spends a long time looping.

Neil ⊥

Posted on the users mailing list.