[racket] Neil: Is there a sensible way to sample from the naturals? EOM

From: Neil Toronto (neil.toronto at gmail.com)
Date: Wed Feb 19 17:02:42 EST 2014

I've CC'd the Racket users mailing list because I thought more people 
(e.g. Robby) would be interested in the answer.

You can't sample uniformly from the naturals, but you can sample 
according to an "anything can happen" sort of distribution. You want a 
distribution with no expected value (aka mean, average).

Economic philosophers use such distributions to disabuse people of the 
idea that maximizing expected value is always the best way to make 
decisions. Here's a simple example.

  1. Flip a coin until you get heads. Call the number of flips X.
  2. You receive Y = 2^X dollars.

You can flip a fair coin, or a coin biased toward tails. Which coin do 
you choose? Obviously the biased coin. But in either case, the expected 
value of Y is infinite. Voila! Paradox.

(The problem isn't that there's no upper bound. For Y = X, the award is 
still unbounded, but for any coin with nonzero heads probability, the 
expected value is finite.)

You want *every* natural to be possible, though, so we'd have to do 
something like sample uniformly between 2^(X-1) and 2^X. Here's a 
function that does it:

   (: random-natural/no-mean (-> Real Natural))
   (define (random-natural/no-mean prob-zero)
     (define n (exact-floor (sample (geometric-dist prob-zero))))
     (define m1 (assert (expt 2 n) integer?))
     (define m0 (quotient m1 2))
     (max 0 (random-integer m0 m1)))

The "max 0" keeps TR from complaining that `random-integer' returns an 
Integer. The `prob-zero' argument is the probability the coin is heads, 
which is also the probability this function returns 0.

This looks sort of like what you want:

 > (random-natural/no-mean 0.001)
- : Integer [more precisely: Nonnegative-Integer]

 > (random-natural/no-mean 0.001)
- : Integer [more precisely: Nonnegative-Integer]

If it returns small numbers too often, you can mitigate that by taking 
the max of a few of them.

If you want a feel for the "average" output, you can put it in terms of 
average number of bits:

 > (mean (build-list 1000 (λ _ (integer-length (random-natural/no-mean 
- : Real
99 407/1000

In fact, now that I think about it, this is probably nearly equivalent:

(: random-natural/no-mean (-> Real Natural))
(define (random-natural/no-mean p)
   (random-bits (add1 (exact-floor (sample (geometric-dist p))))))

The average number of bits should be near (/ 1 p).

Neil ⊥

Posted on the users mailing list.