[racket-dev] Motivation for polymorphic opaque types

From: Neil Toronto (neil.toronto at gmail.com)
Date: Sat Jan 5 18:55:42 EST 2013

I've implemented Okasaki's purely functional, random-access lists in 
Typed Racket. They perform well there. I thought I'd see how they would 
do crossing the contract barrier, so I ported my benchmarks to Racket.

Here's what I get doing `list-ref', passing each index of 
1-to-10-element lists 10000 times:

   list-ref
   cpu time: 20 real time: 15 gc time: 0
   cpu time: 10 real time: 14 gc time: 0
   cpu time: 10 real time: 14 gc time: 0
   cpu time: 20 real time: 14 gc time: 0
   cpu time: 10 real time: 14 gc time: 0

In Typed Racket, `ralist-ref' keeps up pretty well on the same benchmark:

   ralist-ref
   cpu time: 50 real time: 47 gc time: 0
   cpu time: 50 real time: 49 gc time: 0
   cpu time: 50 real time: 47 gc time: 0
   cpu time: 40 real time: 48 gc time: 0
   cpu time: 50 real time: 47 gc time: 0

Unfortunately, it's awfully slow when used on the other side of the 
contract barrier:

   ralist-ref
   cpu time: 25280 real time: 25280 gc time: 560
   cpu time: 25150 real time: 25151 gc time: 490
   cpu time: 25500 real time: 25502 gc time: 880
   cpu time: 25110 real time: 25111 gc time: 560
   cpu time: 25130 real time: 25127 gc time: 220

This was run from the command line. It's 500x slower, which is much 
worse than anything I've seen from arrays.

I think this is the problem: the RAList type has a nested structure that 
has to be checked every time an instance crosses the barrier. Not only 
that, but checking the structure is O(n) in the size of the instance, 
which turns `ralist-ref' from O(log(n)) to O(n).

The following are the types of the values used to generate those 
numbers. Notice that they're all first-order.

   (struct: (A) leaf ([value : A]))
   (struct: (A) node ([value : A] [left : (Tree A)] [right : (Tree A)]))

   (define-type (Tree A) (U (leaf A) (node A)))

   (struct: (A) root ([size : Positive-Index] [tree : (Tree A)]))
   (struct: (A) RAList ([size : Index] [roots : (Listof (root A))])

   (: list->ralist (All (A) ((Listof A) -> (RAList A))))
   (: ralist-ref (All (A) ((RAList A) Integer -> A)))

Example: The instance created by (list->ralist '(1 2 3 4)) is

   (RAList 4 (list (root 1 (leaf 1))
                   (root 3 (node 2 (leaf 3) (leaf 4)))))

Because everything about this first-order data structure is utterly 
typical, the best performance advice I can currently give for mixed 
untyped/typed programs is this:

   Don't let anything but small instances of built-in types cross from
   typed to untyped code, or vice-versa.

Feel free to make exceptions for functions that do a lot of computation 
with their values.

**********

This is sad, because Typed Racket is better than Racket for coding up 
efficient data structures. It *makes a lot of sense* to write all the 
constructors, accessors, and basic operations in Typed Racket, because 
it optimizes the heck out of them. If only moving the values were cheap.

Why isn't it cheap? There's no reason for untyped Racket code to know 
anything about the internal workings of an RAList. That's the essence of 
data abstraction. I'm not even providing the default constructor. It 
shouldn't be possible (without significant hackery that revokes a user's 
right to safety) to create an ill-typed RAList, because they can only be 
created in Typed Racket code.

What if RAList could be exported as a polymorphic, opaque type?

It seems the contract system could then exploit the fact that an RAList 
is always well-typed. What would it take? Tagging structs? A special 
chaperone? User-definable contracts for user struct types?

If RAList were a polymorphic, opaque type, how could we put a contract 
on the following function, which instantiates it?

   (: ralist-sum ((RAList Real) -> Real))
   (define (ralist-sum lst)
     (apply + (ralist->list lst)))

Could exhaustively checking the structure of a polymorphic type be put 
off until such functions are called?

Would it be impossible for mutable struct types to be opaque? Would 
having a type variable in positive position cause problems?

I really don't want to write "Performance Warning: Using random-access 
lists in untyped Racket is currently 500x slower than using them in 
Typed Racket" at the top of the documentation for `data/ralist'.

Neil ⊥

Posted on the dev mailing list.