[racket] Neil: Is there a sensible way to sample from the naturals? EOM

From: Neil Toronto (neil.toronto at gmail.com)
Date: Wed Feb 19 17:02:42 EST 2014

I've CC'd the Racket users mailing list because I thought more people 
(e.g. Robby) would be interested in the answer.

You can't sample uniformly from the naturals, but you can sample 
according to an "anything can happen" sort of distribution. You want a 
distribution with no expected value (aka mean, average).

Economic philosophers use such distributions to disabuse people of the 
idea that maximizing expected value is always the best way to make 
decisions. Here's a simple example.

  1. Flip a coin until you get heads. Call the number of flips X.
  2. You receive Y = 2^X dollars.

You can flip a fair coin, or a coin biased toward tails. Which coin do 
you choose? Obviously the biased coin. But in either case, the expected 
value of Y is infinite. Voila! Paradox.

(The problem isn't that there's no upper bound. For Y = X, the award is 
still unbounded, but for any coin with nonzero heads probability, the 
expected value is finite.)

You want *every* natural to be possible, though, so we'd have to do 
something like sample uniformly between 2^(X-1) and 2^X. Here's a 
function that does it:

   (: random-natural/no-mean (-> Real Natural))
   (define (random-natural/no-mean prob-zero)
     (define n (exact-floor (sample (geometric-dist prob-zero))))
     (define m1 (assert (expt 2 n) integer?))
     (define m0 (quotient m1 2))
     (max 0 (random-integer m0 m1)))

The "max 0" keeps TR from complaining that `random-integer' returns an 
Integer. The `prob-zero' argument is the probability the coin is heads, 
which is also the probability this function returns 0.

This looks sort of like what you want:

 > (random-natural/no-mean 0.001)
- : Integer [more precisely: Nonnegative-Integer]
56136474695225172011728291802626216994256833545894766283873703181
10364986394536406497817120521834403457182656624358136577815679690
73469994282060833573766848756431669747238563269112899118707963866
08154252648824183995287333693058951314331721341986222320438359883
50861513517737507150144340359987088543453799423969409721165923104
82128386772489312894482659745630141444108439622157113717027784284
7612786588731040573293479397701924913229558559022675650838885440

 > (random-natural/no-mean 0.001)
- : Integer [more precisely: Nonnegative-Integer]
57

If it returns small numbers too often, you can mitigate that by taking 
the max of a few of them.

If you want a feel for the "average" output, you can put it in terms of 
average number of bits:

 > (mean (build-list 1000 (λ _ (integer-length (random-natural/no-mean 
0.01)))))
- : Real
99 407/1000

In fact, now that I think about it, this is probably nearly equivalent:

(: random-natural/no-mean (-> Real Natural))
(define (random-natural/no-mean p)
   (random-bits (add1 (exact-floor (sample (geometric-dist p))))))

The average number of bits should be near (/ 1 p).

Neil ⊥


Posted on the users mailing list.