[racket] Typed Racket reactions on developing polymorphic typed array library, and questions
Some of you saw in Chicago that I was steadfastly avoiding doctoral work
by doing some 2D OpenGL game-type stuff in Racket. I decided that I
needed a nice array implementation, and later, that it should be in
Typed Racket so I could wring out as much performance as possible.
(I've read the array SRFIs. The arrays are untyped and underdeveloped,
and could probably never run at interactive speeds in my little demo.)
This email is feedback on using Typed Racket to develop a new library
(i.e. it isn't originally untyped code) that's about 3500 lines long,
and some questions.
I was inspired by Racket's sequences and the "for" macros that allow
looping over different kinds of sequence models using (almost) the same
syntax. I've made an analogous "array"/"arrayn" abstraction ("n" suffix
means multidimensional) to represent fixed-size data that is indexed by
natural numbers, is random-access, and is read-only or read/write.
Therefore, an array is a length (or shape for arrayn) and two
procedures: unsafe-ref and unsafe-set!. Vectors, flvectors, strings, and
bytes are "models" of arrays, and creating an array from one makes a
"view" of it.
Some macros with names starting with "forall" provide syntactic sugar
for mapping over arrays simultaneously. Here's an example of use:
(foralln/flvectorn:
([x : Float (<-flvectorn xs)]
[y : Float (<-flvectorn ys)])
(for/fold: : Float ([dm : Float 0.0]) ([pt : vec2 (in-list pts)])
(+ dm (delta-mass-from pt (vec2 x y))))))
Here, "flvectorn" is an N-dimensional flvector, and "<-flvectorn" is
special syntax like "in-list" (i.e. forall recognizes it as creating an
array backed by an flvectorn and inlines unsafe-flvectorn-ref accordingly).
Vincent will be happy to know that all the flonum and fixnum ops are
changed by TR's optimizer to be unsafe. (Except a certain add1 that TR
can't prove doesn't overflow; I can prove it, so I changed it myself.)
From my admittedly limited tests, I'd say that this particular forall
gets within spitting distance of equivalent C code. ("Spitting distance"
is a technical term meaning "within 3x to 5x time elapsed".) There's a
"foralln!:" that destructively updates arrays, which might get a little
closer; perhaps it gets within pinching distance of the C equivalent's
buttocks.
The rest of my reactions and questions follow.
- Indentation is in a rather evil state in TR. Could DrRacket have
more than one regexp for each "*-like keyword"? That would make it
easier to auto-indent TR's built-in loop syntax, or any other syntax
(like forall) that is in a state of flux. The indentation rules for
for/*: loops could be tricky because of the required "return" type - but
see the next item.
- Why require "return" or body types for for/*: loops in TR? I haven't
needed them in forall/*: loops. Also, writing the "Listof" in "for/list:
: (Listof T)" is annoying. I know it's a list from reading "for/list:",
and for/list: knows it, too. I also know the return type of a
"for/fold:" from the accumulators' types; why require them twice? I'm
also annoyed by having to give a body type for every named "let:".
Similarly, why require return types for functions defined with
"define:"? I'd almost rather write lambdas than give another annotation.
(Not that I use define: very much.)
I don't think it's just a preference for less verbosity. Here, TR can
support good "contracts" practice: it would usually give a more specific
body type or return type than I do. On the rare occasions it confuses
itself, I can annotate the body.
- I'm impressed by TR's dogged consistency, even though sometimes it
makes me want to go to the local animal shelter and flip off all the
kittens. I'm also impressed by its safety. I've occasionally tried to
break it for performance reasons, but couldn't.
- Related: can we get a runtime cast/refine macro? Occurrence typing
makes a naive one easy to implement: (let ([x <expr>]) (if (pred? x) x
(error ...))). Easy implementation means everyone will duplicate it;
might as well incorporate a good one.
- I've found it useful to have a macro shaped like (with-refine* ([x
pred?] ...) body ...). I have a lot of functions that take Integer and
(Listof Integer) arguments, but actually operate only on
Nonnegative-Fixnum indexes. (Nonnegativity and fixnum-ness are hard for
TR to prove.) The function bodies are wrapped with this macro.
- Requiring the 3500-line array library makes compilation take 5-10
more seconds on my desktop machine. On my laptop, it takes an
excruciating *minute* extra. It's an Eee-PC, but Racket is generally
only 2x slower on it. I think it's the extra magic syntax that types
introduce, not the checking itself, because it takes a minute to either
run the code or to tell me that I've misspelled an identifier.
I haven't felt so impatient with a compiler since programming in C++
with templates.
- Polymorphic structs being proper subtypes of their parent structs
would be really nice. As it is, I can't figure out how to cleanly have a
"(vector-array T)" subtype of "(array T)" that gives private array
functions access to the vector backing an array. If I had that, I could
collapse compositions of certain array ops at runtime and remove some
duplicate bounds checks.
- Futures don't work with TR. I had to use require/typed to get them,
so I gather that they're not officially supported. From what I can tell,
for TR code, the JIT gives a large max_let_depth to even simple thunks
like (lambda () 1). It's a shame, because splitting up forall loops with
bodies that use nothing but float ops seems like a killer app for futures.
- Pie-in-the-sky: notice that "<-flvectorn" above is really just a
type annotation that the forall macro understands. It would be nice if
compositional macros could get the types of their arguments - I could
have forall map over any array-like thing without requiring a
declaration of what it is.
I understand why this is hard: the inferred types depend on expansion,
and the expansion would depend on types. While tricky, it seems that
finding a fixpoint w.r.t. the subtype relation could work. Most macros
would iterate zero times or once. Expanding to expressions with
uncomparable types would be an error.
Typeclasses allow function dispatch on the types of arguments; this
would allow a sort of macro dispatch on the types of expanded inner
syntax. You could probably define typeclasses in terms of it.
Don't attack it too fiercely, though, as I haven't given it much deep
thought. :) Also, it is a pie... in the sky.
Neil T