[racket] Small and short-lived data structures

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Mon Oct 21 12:47:38 EDT 2013

On Oct 21, 2013, at 11:33 AM, Konrad Hinsen <konrad.hinsen at fastmail.net> wrote:

> Hi everyone,
> 
> I am facing a situation quite frequent in scientific computing: the
> use of small and often short-lived data structures, where the overhead
> of construction, element access, and garbage collection can easily
> dwarf the cost of computation.
> 
> The classical example is Geometry in 3D space. Define a 3d vector as
> 
>  (struct v3d (x y z))
> 
> and then a few elementary operations such as
> 
>  (define (vadd p1 p2)
>    (v3d (+ (v3d-x p1) (v3d-x p2))
>         (+ (v3d-y p1) (v3d-y p2))
>         (+ (v3d-z p1) (v3d-z p2))))
> 
> Geometric formulas written in terms of these operations then create
> lots of short-lived intermediate results, and probably waste many CPU
> cycles in wrapping and unwrapping number triples. Ideally, a compiler
> would inline all the functions and then optimize away the intermediate
> results.
> 

You can count on Racket to inline a bunch of such small functions. 
You can use Optimization Coach to check on your understanding of
what the compiler actually does, and you may benefit from its advice. 

Currently, Racket does not perform loop fusion or deforestation
for data structures that merely connect loops/functions/etc. 

In general, it is a mistaken assumption however that even in a 
'managed' language -- such as Java or Racket -- mutation is still
faster than de/allocation of short-lived data structures. If you
choose to go with 'managed' language, you need to adapt your 
programming style to the new context. 

[Some of my colleagues discovered this problem in the early 2000s
and started thinking about how to make the GC work with an imperative
programming style. Not sure whether this works really well now.]



> I wonder how to handle such computations most efficiently in Racket.
> The internal representation would not be exposed to clients, so it can
> be chosen for performance. What's the best option? Structures? Vectors?
> Maybe even lists? Or yet something else? Would it help to move to
> Typed Racket and declare the fields as numbers, or even flonums?


I would definitely recommend (1) using Typed Racket and (2) using 
the most efficient numbers that we offer in TR. I'd also keep the 
entire program in TR. I'd still write code in the most mathematical
manner and consult OC to see how to squeeze the most out of this style. 
When I found that certain styles don't get optimized the way I expect
them to get optimized, I would ask the Racket devs whether they could
make improvements so that specific functions/programs work better. 

-- Matthias






Posted on the users mailing list.