[racket] Randomized bisimulation tests and Metropolis-Hastings random walk
I get excited about applying statistics to programming. Here's something
exciting I found today.
I'm working on a Typed Racket implementation of Chris Okasaki's purely
functional random-access lists, which are O(1) for `cons', `first' and
`rest', and basically O(log(n)) for random access. I wanted solid
randomized tests, and I figured the best way would be bisimulation: do
exactly the same, random thing to a (Listof Integer) and an (RAList
Integer), then ensure the lists are the same. Iterate N times, each time
using the altered lists for the next bisimulation step.
There's a problem with this, though: the test runtimes are all over the
place for any fixed N, and are especially wild with large N. Sometimes
it first does a bunch of `cons' operations in a row. Because each
bisimulation step is O(n) in the list length (to check equality), this
makes the remaining steps very slow. Sometimes it follows up with a
bunch of `rest' operations, which makes the remaining steps very fast.
More precisely, random bisimulation does a "simple random walk" over
test list lengths, where `cons' is a step from n to n+1 and `rest' is a
step from n to n-1. Unfortunately, the n's stepped on in a simple random
walk have no fixed probability distribution, and thus no fixed average
or variance. That means each bisimulation step has no fixed average
runtime or fixed variation in runtime. Wildness ensues.
One possible solution is to generate fresh test lists from a fixed
distribution, for each bisimulation step. This is a bad solution,
though, because I want to see whether RAList instances behave correctly
*over time* as they're operated on.
The right solution is to use a Metropolis-Hastings random walk instead
of a simple random walk, to sample list lengths from a fixed
distribution. Then, the list lengths will have a fixed average, meaning
that each bisimulation step will have a fixed average runtime, and my
overall average test runtime will be O(N) instead of wild and crazy.
First, I choose a distribution. The geometric family is good because
list lengths are nonnegative. Geometric with p = 0.05 means list lengths
should be (1-p)/p = 19 on average, but are empty with probability 0.05
and very large with very low probability.
So I add these two lines:
(require math/distributions)
(define length-dist (geometric-dist 0.05))
In a Metropolis-Hastings random walk, you step from n to n' with
probability min(1,P(n')/P(n)); this is called "accepting" the step.
Otherwise, you "reject" the step by staying in the same spot. Adding
this new test takes only two defines and a `cond':
(define r (random 5)) ; Randomly choose an operation
(cond
[(= r 0)
;; New! Only cons with probability P(n+1)/P(n)
(define n (length lst))
(define accept (/ (pdf length-dist (+ n 1))
(pdf length-dist n)))
(cond [((random) . < . accept)
.... Accept: cons and ensure continued equality ....]
[else
.... Reject: return the same lists ....])]
[(= r 1)
.... Accept: `rest' each list and ensure continued equality ....]
.... Other random operations ....)
I left off the acceptance test for `rest' because P(n-1) > P(n) ==>
P(n-1)/P(n) > 1, so the test would always pass. (This is because the
geometric distribution's pdf is monotone. If I had chosen a Poisson
distribution, which has a bump in the middle, I'd have to do the
acceptance test for both `cons' and `rest'.)
I instrument the test loop to record list lengths, verify that their
mean is about 19, and do some density plots vs. the geometric
distribution. It all looks good: the randomized bisimulation test does
indeed operate on lists whose length is distributed Geometric(0.05), so
the overall average test runtime is O(N), and it still tests them as
they're operated on over time.
Neil ⊥