[racket] Typed Racket reactions on developing polymorphic typed array library, and questions

From: Neil Toronto (neil.toronto at gmail.com)
Date: Mon Aug 23 00:31:09 EDT 2010

Some of you saw in Chicago that I was steadfastly avoiding doctoral work 
by doing some 2D OpenGL game-type stuff in Racket. I decided that I 
needed a nice array implementation, and later, that it should be in 
Typed Racket so I could wring out as much performance as possible.

(I've read the array SRFIs. The arrays are untyped and underdeveloped, 
and could probably never run at interactive speeds in my little demo.)

This email is feedback on using Typed Racket to develop a new library 
(i.e. it isn't originally untyped code) that's about 3500 lines long, 
and some questions.

I was inspired by Racket's sequences and the "for" macros that allow 
looping over different kinds of sequence models using (almost) the same 
syntax. I've made an analogous "array"/"arrayn" abstraction ("n" suffix 
means multidimensional) to represent fixed-size data that is indexed by 
natural numbers, is random-access, and is read-only or read/write. 
Therefore, an array is a length (or shape for arrayn) and two 
procedures: unsafe-ref and unsafe-set!. Vectors, flvectors, strings, and 
bytes are "models" of arrays, and creating an array from one makes a 
"view" of it.

Some macros with names starting with "forall" provide syntactic sugar 
for mapping over arrays simultaneously. Here's an example of use:

(foralln/flvectorn:
  ([x : Float  (<-flvectorn xs)]
   [y : Float  (<-flvectorn ys)])
  (for/fold: : Float ([dm : Float  0.0]) ([pt : vec2  (in-list pts)])
             (+ dm (delta-mass-from pt (vec2 x y))))))

Here, "flvectorn" is an N-dimensional flvector, and "<-flvectorn" is 
special syntax like "in-list" (i.e. forall recognizes it as creating an 
array backed by an flvectorn and inlines unsafe-flvectorn-ref accordingly).

Vincent will be happy to know that all the flonum and fixnum ops are 
changed by TR's optimizer to be unsafe. (Except a certain add1 that TR 
can't prove doesn't overflow; I can prove it, so I changed it myself.) 
 From my admittedly limited tests, I'd say that this particular forall 
gets within spitting distance of equivalent C code. ("Spitting distance" 
is a technical term meaning "within 3x to 5x time elapsed".) There's a 
"foralln!:" that destructively updates arrays, which might get a little 
closer; perhaps it gets within pinching distance of the C equivalent's 
buttocks.

The rest of my reactions and questions follow.

  - Indentation is in a rather evil state in TR. Could DrRacket have 
more than one regexp for each "*-like keyword"? That would make it 
easier to auto-indent TR's built-in loop syntax, or any other syntax 
(like forall) that is in a state of flux. The indentation rules for 
for/*: loops could be tricky because of the required "return" type - but 
see the next item.

  - Why require "return" or body types for for/*: loops in TR? I haven't 
needed them in forall/*: loops. Also, writing the "Listof" in "for/list: 
: (Listof T)" is annoying. I know it's a list from reading "for/list:", 
and for/list: knows it, too. I also know the return type of a 
"for/fold:" from the accumulators' types; why require them twice? I'm 
also annoyed by having to give a body type for every named "let:".

Similarly, why require return types for functions defined with 
"define:"? I'd almost rather write lambdas than give another annotation. 
(Not that I use define: very much.)

I don't think it's just a preference for less verbosity. Here, TR can 
support good "contracts" practice: it would usually give a more specific 
body type or return type than I do. On the rare occasions it confuses 
itself, I can annotate the body.

  - I'm impressed by TR's dogged consistency, even though sometimes it 
makes me want to go to the local animal shelter and flip off all the 
kittens. I'm also impressed by its safety. I've occasionally tried to 
break it for performance reasons, but couldn't.

  - Related: can we get a runtime cast/refine macro? Occurrence typing 
makes a naive one easy to implement: (let ([x <expr>]) (if (pred? x) x 
(error ...))). Easy implementation means everyone will duplicate it; 
might as well incorporate a good one.

  - I've found it useful to have a macro shaped like (with-refine* ([x 
pred?] ...) body ...). I have a lot of functions that take Integer and 
(Listof Integer) arguments, but actually operate only on 
Nonnegative-Fixnum indexes. (Nonnegativity and fixnum-ness are hard for 
TR to prove.) The function bodies are wrapped with this macro.

  - Requiring the 3500-line array library makes compilation take 5-10 
more seconds on my desktop machine. On my laptop, it takes an 
excruciating *minute* extra. It's an Eee-PC, but Racket is generally 
only 2x slower on it. I think it's the extra magic syntax that types 
introduce, not the checking itself, because it takes a minute to either 
run the code or to tell me that I've misspelled an identifier.

I haven't felt so impatient with a compiler since programming in C++ 
with templates.

  - Polymorphic structs being proper subtypes of their parent structs 
would be really nice. As it is, I can't figure out how to cleanly have a 
"(vector-array T)" subtype of "(array T)" that gives private array 
functions access to the vector backing an array. If I had that, I could 
collapse compositions of certain array ops at runtime and remove some 
duplicate bounds checks.

  - Futures don't work with TR. I had to use require/typed to get them, 
so I gather that they're not officially supported. From what I can tell, 
for TR code, the JIT gives a large max_let_depth to even simple thunks 
like (lambda () 1). It's a shame, because splitting up forall loops with 
bodies that use nothing but float ops seems like a killer app for futures.

  - Pie-in-the-sky: notice that "<-flvectorn" above is really just a 
type annotation that the forall macro understands. It would be nice if 
compositional macros could get the types of their arguments - I could 
have forall map over any array-like thing without requiring a 
declaration of what it is.

I understand why this is hard: the inferred types depend on expansion, 
and the expansion would depend on types. While tricky, it seems that 
finding a fixpoint w.r.t. the subtype relation could work. Most macros 
would iterate zero times or once. Expanding to expressions with 
uncomparable types would be an error.

Typeclasses allow function dispatch on the types of arguments; this 
would allow a sort of macro dispatch on the types of expanded inner 
syntax. You could probably define typeclasses in terms of it.

Don't attack it too fiercely, though, as I haven't given it much deep 
thought. :) Also, it is a pie... in the sky.

Neil T



Posted on the users mailing list.