[racket] Randomized bisimulation tests and Metropolis-Hastings random walk

From: Neil Toronto (neil.toronto at gmail.com)
Date: Fri Jan 4 16:45:39 EST 2013

I get excited about applying statistics to programming. Here's something 
exciting I found today.

I'm working on a Typed Racket implementation of Chris Okasaki's purely 
functional random-access lists, which are O(1) for `cons', `first' and 
`rest', and basically O(log(n)) for random access. I wanted solid 
randomized tests, and I figured the best way would be bisimulation: do 
exactly the same, random thing to a (Listof Integer) and an (RAList 
Integer), then ensure the lists are the same. Iterate N times, each time 
using the altered lists for the next bisimulation step.

There's a problem with this, though: the test runtimes are all over the 
place for any fixed N, and are especially wild with large N. Sometimes 
it first does a bunch of `cons' operations in a row. Because each 
bisimulation step is O(n) in the list length (to check equality), this 
makes the remaining steps very slow. Sometimes it follows up with a 
bunch of `rest' operations, which makes the remaining steps very fast.

More precisely, random bisimulation does a "simple random walk" over 
test list lengths, where `cons' is a step from n to n+1 and `rest' is a 
step from n to n-1. Unfortunately, the n's stepped on in a simple random 
walk have no fixed probability distribution, and thus no fixed average 
or variance. That means each bisimulation step has no fixed average 
runtime or fixed variation in runtime. Wildness ensues.

One possible solution is to generate fresh test lists from a fixed 
distribution, for each bisimulation step. This is a bad solution, 
though, because I want to see whether RAList instances behave correctly 
*over time* as they're operated on.

The right solution is to use a Metropolis-Hastings random walk instead 
of a simple random walk, to sample list lengths from a fixed 
distribution. Then, the list lengths will have a fixed average, meaning 
that each bisimulation step will have a fixed average runtime, and my 
overall average test runtime will be O(N) instead of wild and crazy.

First, I choose a distribution. The geometric family is good because 
list lengths are nonnegative. Geometric with p = 0.05 means list lengths 
should be (1-p)/p = 19 on average, but are empty with probability 0.05 
and very large with very low probability.

So I add these two lines:

   (require math/distributions)

   (define length-dist (geometric-dist 0.05))

In a Metropolis-Hastings random walk, you step from n to n' with 
probability min(1,P(n')/P(n)); this is called "accepting" the step. 
Otherwise, you "reject" the step by staying in the same spot. Adding 
this new test takes only two defines and a `cond':

   (define r (random 5))  ; Randomly choose an operation
   (cond
     [(= r 0)
      ;; New! Only cons with probability P(n+1)/P(n)
      (define n (length lst))
      (define accept (/ (pdf length-dist (+ n 1))
                        (pdf length-dist n)))
      (cond [((random) . < . accept)
             .... Accept: cons and ensure continued equality ....]
            [else
             .... Reject: return the same lists ....])]
     [(= r 1)
      .... Accept: `rest' each list and ensure continued equality ....]
     .... Other random operations ....)

I left off the acceptance test for `rest' because P(n-1) > P(n) ==> 
P(n-1)/P(n) > 1, so the test would always pass. (This is because the 
geometric distribution's pdf is monotone. If I had chosen a Poisson 
distribution, which has a bump in the middle, I'd have to do the 
acceptance test for both `cons' and `rest'.)

I instrument the test loop to record list lengths, verify that their 
mean is about 19, and do some density plots vs. the geometric 
distribution. It all looks good: the randomized bisimulation test does 
indeed operate on lists whose length is distributed Geometric(0.05), so 
the overall average test runtime is O(N), and it still tests them as 
they're operated on over time.

Neil ⊥

Posted on the users mailing list.