[racket-dev] floor-log/base, ceiling-log/base, from Neil Toronto's recent commit

From: Neil Toronto (neil.toronto at gmail.com)
Date: Wed Feb 1 23:10:16 EST 2012

On 02/01/2012 05:58 PM, John Boyle wrote:
> I happened to observe this commit from today by Neil Toronto:
> http://git.racket-lang.org/plt/commitdiff/47fcdd4916a2d33ee5c28eb833397ce1d2a515e2
> I may have some useful input on this, having dealt with similar problems
> myself.
> The problem: Given b and x, you want to find the largest integer n such
> that b^n <= x (or, for the ceiling problem, the smallest integer n such
> that b^n ≥ x).  This can profitably be viewed as a case of a more
> general problem: given a monotonically increasing but otherwise opaque
> function f, find the largest integer n such that f(n) ≤ x.  The previous
> algorithm to solve this can now be viewed as a linear search, taking n
> steps.  This took too long, and Neil Toronto replaced it with an
> algorithm that depended on floating-point numbers; but I hate to see
> that happen, because floats will screw up when the numbers get too
> large; careful reasoning may prove that floats will be sufficiently
> accurate for a given application, but it would be nice to have an exact
> algorithm so that one didn't have to do that.
> I come here to present such an algorithm.  Simply put: Use a binary
> search rather than a linear search.  How do we do binary search on "the
> nonnegative integers"?  Well, start with 0 and 1 as your bounds, and
> keep doubling the 1 until you get something that's actually too high;
> then you have a real lower and upper bound.  This will take log(n) steps
> to find the upper bound, and then a further log(n) steps to tighten the
> bounds to a single integer.  Thus, this takes roughly 2*log(n) steps,
> where each step involves calculating (expt b n) plus a comparison.  It
> might be possible to do slightly better by not treating b^n as a black
> box (e.g. by using old results of b^n to compute new ones rather than
> calling "expt" from scratch, or by using "integer-length" plus some
> arithmetic to get some good initial bounds), but I suspect this is good
> enough and further complexity is not worth it.
> Note that this algorithm should work perfectly with any nonnegative
> rational arguments [other than 0 for b or x, and 1 for b], and should
> give as good an answer as any with floating-point arguments.  Also,
> "integer-finverse", as I called it, might be useful for many other
> "floor"-type computations with exact numbers (I have so used it myself
> in an ad hoc Arc math library).
> ;Finds n such that n >= 0 and f(n) <= x < f(n+1) in about 2*log(n) steps.
> ;Assumes f is monotonically increasing.
> (define (integer-finverse f x)
>    (define upper-bound
>      (let loop ((n 1))
>        (if (> (f n) x)
>            n
>            (loop (* n 2)))))
>    (define (search a b) ;b is too big, a is not too big
>      (let ((d (- b a)))
>        (if (= d 1)
>            a
>            (let ((m (+ a (quotient d 2))))
>              (if (> (f m) x)
>                  (search a m)
>                  (search m b))))))
>    (search (quotient upper-bound 2) upper-bound))
> (define (floor-log/base b x)
>    (cond ((< b 1) (- (ceiling-log/base (/ b) x)))
>          ((= b 1) (error "Base shouldn't be 1."))
>          ((< x 1) (- (ceiling-log/base b (/ x))))
>          (else (integer-finverse (lambda (n) (expt b n)) x))))
> (define (ceiling-log/base b x)
>    (cond ((< b 1) (- (floor-log/base (/ b) x)))
>          ((= b 1) (error "Base shouldn't be 1."))
>          ((< x 1) (- (floor-log/base b (/ x))))
>          (else
>           (let ((u (floor-log/base b x)))
>             (if (= x (expt b u))
>                 u
>                 (+ u 1))))))
> Testing:
>  > (floor-log/base 10 3)
> 0
>  > (floor-log/base 3 10)
> 2
>  > (ceiling-log/base 3 10)
> 3
>  > (ceiling-log/base 1/3 10)
> -2
>  > (floor-log/base 1/3 10)
> -3
>  > (floor-log/base 2/3 10)
> -6
>  > (ceiling-log/base 2/3 10)
> -5
>  > (exact->inexact (expt 3/2 6))
> 11.390625
>  > (exact->inexact (expt 3/2 5))
> 7.59375

Neet! I'll have to do some timing tests.

The functions are used to format plot labels for numbers outside 
floating-point range. As it is, the iterations will fix any under- or 
overestimate. Floating-point math is no problem: I just use it to get an 
initial estimate.

Also, I was surprised to find that the `log' function works on numbers 
well outside floating-point range. (Big ones, not small ones, which is 
why I subtract the logs of the numerator and denominator.) It works by 
taking successive integer square roots to reduce the argument first. If 
it stops working at some number size, that number is probably too large 
to fit in memory.

But for some large numbers that aren't too huge - say, megabyte-sized - 
reducing the argument takes a *long time*. I'll be interested to try 
switching to your solution for that case to see if it works out better.

Also, I'ma keep this code around for other stuff.

> I might add, by the way, that I'm inclined to expect the base to be a
> second, optional argument (perhaps defaulting to 10) to a function
> called simply "floor-log" or "ceiling-log".  I don't know if that fits
> with desired conventions, though.

The most consistent default is the natural base, but the discontinuity 
of floor/ceiling makes using transcendental bases pretty much 
impossible. :/ Next best would be 2 or 10, but I couldn't decide between 

Neil ⊥

Posted on the dev mailing list.