[racket] Puzzled about type inference

From: Norman Gray (norman at astro.gla.ac.uk)
Date: Tue Aug 5 05:36:15 EDT 2014

Matthias and Sam, hello.

This has been a very useful brief thread for me, and also, I suspect, for others.

On 2014 Aug 4, at 01:22, Matthias Felleisen <matthias at ccs.neu.edu> wrote:

> In a sense, you put your finger on a sore spot of TR from R -- we are still figuring out how to help programmers along here.

Such help would be most valuable!

> ;; entirely unnecessary but helped me read; IGNORE 
> (define-type (Maybe α) (U False α))
> (define-type NI Nonnegative-Integer)
> (define-type LNI (Listof NI))
> (define-type VNI (Vectorof NI))

This is clearly a good idiom/pattern: very local, and telegraphically short, abbreviations for useful types.

> Then I checked on the maps and realized that I need to help the TC along with the instantiations of vector->list (how can it know with context-free reasoning what kind of vector it is) and the same for car (how can it know what kind of list you're getting here). 

So this returns to the question of what the TC can and cannot be expected to work out for itself, and I see that Sam had a couple of remarks on this later.

> (let* ((given  : LVNI '(#(1 2) #(3 4) #(5 6)))
>       (length : NI   (1+ (apply max (map (lambda ({x : VNI}) (vector-ref x 0)) given))))
>       (result : [Vectorof [Maybe NI]] (make-vector length #f)))
>  (for ((q : VNI (in-list given)))
>    (vector-set! result (vector-ref q 0) (vector-ref q 1)))
>  result)
> 
> The above -- minus types -- is how I would probably write the code in Racket, following the style guide. (I would use three defines instead of let* but let's ignore this point.)

That looks _much_ nicer than what I had, and is still perfectly readable.

> If I were to move this code to Typed Racket, I'd add a type declaration to each identifier just like in any other (explicitly, statically) typed programming language. That's what I did, and there was no problem checking and running your code. 

So the pattern I see there, which both you and Sam have illustrated, is:

  * annotate as far 'down' the tree as possible, so that type information can propagate upwards in the parse tree

  * If one liberally uses named bindings (recommended on other style grounds, in any case), then each of these is a natural location for a type annotation

  * when a polymorphic-typed function takes polymorphic arguments, annotate the arguments, not the function (that is, I see that, in (map car '(foo ...))  both you and Sam annotated the car, not the map); this is in harmony with 'annotate as far down as possible', for some value of 'down'.

Sam said:

> The most signficant differences are (1) I'm not using `cast`, which
> should be avoided when possible. `cast` is a runtime operation, and
> might be hiding arbitrary bugs. (2) I've annotated the `let`-bound
> variables, which makes life easier on the rest of the type checker and
> helps improve error messages.

Ah, now this is something I've meant to ask about: what's the difference between ann and cast?

The manual says:

> (ann e t)
> Ensure that e has type t, or some subtype. The entire expression has type t. This is legal only in expression contexts.

> (cast e t)
> The entire expression has the type t, while e may have any type. The value of the entire expression is the value returned by e, protected by a contract ensuring that it has type t. This is legal only in expression contexts.

I've never been sure whether the difference between "Ensure that e has type t" (which sounds rather runtime-y) and "The entire expression has the type t" is significant.  The only difference I could parse from this is that `cast` includes a contract, which would appear to make it _safer_ than `ann`, in the sense that if this were accidentally given a wrong-typed value than it would cause an immediate error.

I presume that the difference is that `ann` is about static type analysis, whereas `cast` is for the situation where that static analysis can't be done.  Would I be correct that in the above case I can and should write

     (ann '(#(1 2) #(3 4) #(5 6)) (Listof VNI)

since this is saying that this expression has a more specific type than the TC would work out by itself, and it can verify that the restriction is possible.  In the real program, these numbers appear from a SQL query, and so this list is actually of type (Listof (Vectorof (U Integer String))), so that I should indeed be writing

    (cast (query-rows ...) (Listof VNI))

because these numbers are coming in, dynamically, from _outside_ the reach of the static TC, and so the annotation needs to be dynamically verified at this point.  That is: `cast` is for the 'boundary' of the type system, whereas `ann` is for adding detail within it?

Ah -- I think I've just had a little epiphany.  In which case, it might be useful to transplant a little of that into the ann/cast section of the manual.  The examples shown there don't really bring out the distinction to the naive reader. (Just by the way, `assert` seems to be doing rather similar work to these two, and while I can appreciate a distinction, from the text, I'd struggle to express it compactly -- a compare-and-contrast explanation in the manual would be very illuminating)

(Does this imply that when TR code is called from untyped code, the arguments are implicitly `cast` at call-time?)

> There are a few things that make this program require more annotations
> than I'd like.
> 
> 1. TR doesn't handle automatically inferring type arguments when
> polymorphic functions (like `map`) are given polymorphic arguments
> like `vector->list`. While fixing this would require work, it just
> needs to be done -- there's no reason we shouldn't be able to make
> this work.

Righto -- that's what I wondered, whether it was 'just a matter of code' or whether there was something fundamentally more complicated about this, which I wasn't understanding.

Thanks, both, for your careful explanations.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



Posted on the users mailing list.