[racket] Puzzled about type inference

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Tue Aug 5 09:23:00 EDT 2014

In principle your email summarizes it all. I would like to add two comments though: 

--- 1. There was no contradiction between Sam's email (happy birthday Sam) and mine. My initial reaction was to fix up your program in a 'least intrusive' manner. That's why I left in the cast and simply pinpointed those places in the program where the type checker really needs help. When I found the open DrRacket buffer again, I saw the code in a whole new light, especially in violation of my (untyped) coding style and once I recognized the delta, I realized that I really wanted to write the program according to the Typed Racket philosophy: 

	add type declarations to variables and fields and function and method signatures. 

That should work. No code rewrites necessary. Period. -- Sam's reply points out that this is basically possible for your code, without the style-based rewrite. [We confirmed this point in an off-line exchange.] 

--- 2. Well, the 'period' part leads my to my second comment. While we have achieved our goal -- mostly, kind of -- for mostly functional Racket, we're not there yet. On occasion, we do need to insert casts. Worse, we do need to rewrite code. Worst, the type annotations are more of a burden than the TR slogan leads us to believe. 

Yes, type annotations are helpful for any future reader of the program, especially if they are correct. They bring across intentions, and with good function and type names, extensions. You need little more to 'get' the code. In a sense, type annotations explicate design documentation the way the HtDP design recipe suggests. [And yes, we also need tests to illustrate abstract statements and property-checking would be even better.] The code truly explains itself -- when combined with the rules from the Style guide (which we don't always follow and couldn't in the past). 

But, type annotations become really heavy in some cases (polymorphism) and simply labor in others. ____(: s String) (define s "hello world")___ anyone? So Sam injected _local type inference_ into the original TR. LOCAL means just that -- within a maximal S-expression TR tries to restore missing type declarations. Sometimes it works, sometimes it fails. 

We know from experience that MODULE-GLOBAL inference (Hindley Milner a la ML or Hindley-Milner soft type a la Fagan/Wright) doesn't work. Error messages are bad and cost more time than the reduction in labor is worth. 

Bottom line is then, TR is in an actual state and then there is the ideal that some of us have in our head where local type inference does a lot more and becomes almost as good as GLOBAL ti in terms of convenience. It is critical to keep this in mind. TR isn'f finished and it will continue to move from 'actual' to 'ideal'. 


Having said that, this thread definitely points out (1) we need to fix some of the write-ups about TR and (2) we should add sections about TR to the Style guide. We will do so ... 

-- Matthias
















On Aug 5, 2014, at 5:36 AM, Norman Gray wrote:

> 
> Matthias and Sam, hello.
> 
> This has been a very useful brief thread for me, and also, I suspect, for others.
> 
> On 2014 Aug 4, at 01:22, Matthias Felleisen <matthias at ccs.neu.edu> wrote:
> 
>> In a sense, you put your finger on a sore spot of TR from R -- we are still figuring out how to help programmers along here.
> 
> Such help would be most valuable!
> 
>> ;; entirely unnecessary but helped me read; IGNORE 
>> (define-type (Maybe α) (U False α))
>> (define-type NI Nonnegative-Integer)
>> (define-type LNI (Listof NI))
>> (define-type VNI (Vectorof NI))
> 
> This is clearly a good idiom/pattern: very local, and telegraphically short, abbreviations for useful types.
> 
>> Then I checked on the maps and realized that I need to help the TC along with the instantiations of vector->list (how can it know with context-free reasoning what kind of vector it is) and the same for car (how can it know what kind of list you're getting here). 
> 
> So this returns to the question of what the TC can and cannot be expected to work out for itself, and I see that Sam had a couple of remarks on this later.
> 
>> (let* ((given  : LVNI '(#(1 2) #(3 4) #(5 6)))
>>      (length : NI   (1+ (apply max (map (lambda ({x : VNI}) (vector-ref x 0)) given))))
>>      (result : [Vectorof [Maybe NI]] (make-vector length #f)))
>> (for ((q : VNI (in-list given)))
>>   (vector-set! result (vector-ref q 0) (vector-ref q 1)))
>> result)
>> 
>> The above -- minus types -- is how I would probably write the code in Racket, following the style guide. (I would use three defines instead of let* but let's ignore this point.)
> 
> That looks _much_ nicer than what I had, and is still perfectly readable.
> 
>> If I were to move this code to Typed Racket, I'd add a type declaration to each identifier just like in any other (explicitly, statically) typed programming language. That's what I did, and there was no problem checking and running your code. 
> 
> So the pattern I see there, which both you and Sam have illustrated, is:
> 
>  * annotate as far 'down' the tree as possible, so that type information can propagate upwards in the parse tree
> 
>  * If one liberally uses named bindings (recommended on other style grounds, in any case), then each of these is a natural location for a type annotation
> 
>  * when a polymorphic-typed function takes polymorphic arguments, annotate the arguments, not the function (that is, I see that, in (map car '(foo ...))  both you and Sam annotated the car, not the map); this is in harmony with 'annotate as far down as possible', for some value of 'down'.
> 
> Sam said:
> 
>> The most signficant differences are (1) I'm not using `cast`, which
>> should be avoided when possible. `cast` is a runtime operation, and
>> might be hiding arbitrary bugs. (2) I've annotated the `let`-bound
>> variables, which makes life easier on the rest of the type checker and
>> helps improve error messages.
> 
> Ah, now this is something I've meant to ask about: what's the difference between ann and cast?
> 
> The manual says:
> 
>> (ann e t)
>> Ensure that e has type t, or some subtype. The entire expression has type t. This is legal only in expression contexts.
> 
>> (cast e t)
>> The entire expression has the type t, while e may have any type. The value of the entire expression is the value returned by e, protected by a contract ensuring that it has type t. This is legal only in expression contexts.
> 
> I've never been sure whether the difference between "Ensure that e has type t" (which sounds rather runtime-y) and "The entire expression has the type t" is significant.  The only difference I could parse from this is that `cast` includes a contract, which would appear to make it _safer_ than `ann`, in the sense that if this were accidentally given a wrong-typed value than it would cause an immediate error.
> 
> I presume that the difference is that `ann` is about static type analysis, whereas `cast` is for the situation where that static analysis can't be done.  Would I be correct that in the above case I can and should write
> 
>     (ann '(#(1 2) #(3 4) #(5 6)) (Listof VNI)
> 
> since this is saying that this expression has a more specific type than the TC would work out by itself, and it can verify that the restriction is possible.  In the real program, these numbers appear from a SQL query, and so this list is actually of type (Listof (Vectorof (U Integer String))), so that I should indeed be writing
> 
>    (cast (query-rows ...) (Listof VNI))
> 
> because these numbers are coming in, dynamically, from _outside_ the reach of the static TC, and so the annotation needs to be dynamically verified at this point.  That is: `cast` is for the 'boundary' of the type system, whereas `ann` is for adding detail within it?
> 
> Ah -- I think I've just had a little epiphany.  In which case, it might be useful to transplant a little of that into the ann/cast section of the manual.  The examples shown there don't really bring out the distinction to the naive reader. (Just by the way, `assert` seems to be doing rather similar work to these two, and while I can appreciate a distinction, from the text, I'd struggle to express it compactly -- a compare-and-contrast explanation in the manual would be very illuminating)
> 
> (Does this imply that when TR code is called from untyped code, the arguments are implicitly `cast` at call-time?)
> 
>> There are a few things that make this program require more annotations
>> than I'd like.
>> 
>> 1. TR doesn't handle automatically inferring type arguments when
>> polymorphic functions (like `map`) are given polymorphic arguments
>> like `vector->list`. While fixing this would require work, it just
>> needs to be done -- there's no reason we shouldn't be able to make
>> this work.
> 
> Righto -- that's what I wondered, whether it was 'just a matter of code' or whether there was something fundamentally more complicated about this, which I wasn't understanding.
> 
> Thanks, both, for your careful explanations.
> 
> Best wishes,
> 
> Norman
> 
> 
> -- 
> Norman Gray  :  http://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
> 



Posted on the users mailing list.