[racket-dev] Inferring numeric function types from representative argument values

From: Vincent St-Amour (stamourv at ccs.neu.edu)
Date: Fri Jan 4 12:38:12 EST 2013

At Thu, 03 Jan 2013 21:09:52 -0700,
Neil Toronto wrote:
> 
> I thought it would be helpful to find the most precise types possible 
> for numeric functions. I wrote a program that infers them using a few 
> thousand representative argument values, which have been chosen to 
> exhibit underflow, overflow, exactness preservation (e.g. perfect 
> squares for `sqrt' and 1 for `log'), and every special value.

Cool!

I can see this being useful to tighten existing types.

If the program could only report cases where the inferred type is more
precise than existing types, that would be great. Otherwise, having to
manually look for cases to add to existing types would be very
labor-intensive. That should be doable by reusing some of the machinery
from the random tester.

Also, it would be great to extend your program to infer filters, too.
Types with filters are probably the trickiest to write/get right. Not
sure how to infer those without getting a lot of noise, though.

> Inference generalizes return types a little (PosRatNotInt to PosRat, for 
> example), joins function types with the same return type, compresses 
> unions of products, and names common union types.

Sounds good.

You may be interested in TR's typechecking rules for literals, which do
similar generalizations. (Not to be confused with `generalize', in the
TR implementation, which generalizes types in invariant positions, for
usability.) It's in `tc-literal' in `typechecker/tc-expr-unit.rkt'.
Numeric types returned by `tc-literal' should be the only ones that
users have to deal with. Others, like `PosRatNotInt', are not exported
and should not be used in the standard library.

> It doesn't construct a  monotone `case->' type by computing a
> cumulative union

For unary function types, that's pretty straightforward.

For binary function types and beyond, it gets tricky. Assume you have a
function whose domain types include both `Byte Int' and `Int Byte'.
Which do you put first? Neither subsumes the other, and depending on
which you pick, the return type will be more precise in some cases and
less in others, and vice versa. At this point, the ordering decision
becomes a matter of usability. I don't know how much of that is
automatable.

By the way, this is not just a theoretical concern. The limitation you
reported recently about `(i . fx> . 0)' was exactly that.

A solution, which Sam and I discussed some time ago, would be to change
how typechecking of intersection types works. Instead of looking at the
cases in order and picking the first one that matches, the typechecker
could pick all the cases that match, and intersect the return types.
This could tremendously simplify the types of many numeric operations
(and potentially speed up typechecking), and would solve the problem
above.

However, the consequences of that change are not obvious. It's clearly
backwards-incompatible, but it's unclear which programs such a change
would break, if any. Maybe it's worth revisiting that idea.

Vincent

Posted on the dev mailing list.