[racket] [PATCH] Speeding up set-argb-pixels

From: Michael Wilber (mwilber at uccs.edu)
Date: Sun Dec 16 13:29:58 EST 2012

TL;DR: About ~2.8x speedup from using local variables and unsafe
functions. Copying each bitmap row could bring speedup to ~20x, but it
doesn't quite work and I need your help. Pull request at
https://github.com/plt/racket/pull/199

Hey there!

I'm writing some FFmpeg bindings for Racket. It's fast enough to decode
video in real time, but on my machine, set-argb-pixels takes 189.35±1.3
msec to run for a 500x500 image, which means I'm limited to displaying
frames at ~5fps.

Here's a toy benchmark to test set-argb-pixels:
https://gist.github.com/4a5661dfad984cfdab19

There are some very simple bottlenecks that I've started to address:

1. It turns out that the references to b&w? and alpha-channel-local? for
   each pixel are slow slow slow. Making them local variables drops the
   time down to 124.8±1.0msec. This three-line change gives a speedup
   factor of about ~1.5

2. Using unsafe functions everywhere (unsafe-bytes-ref and friends,
   unsafe-fx+ and friends) drops it further to 67.05±0.6msec, which is a
   speedup factor of ~2.82 over the original on my machine

A pull request for the above is at
https://github.com/plt/racket/pull/199

Now, if we can assume that the input bytes already contain pre-clipped,
premultiplied data, we don't really have to loop through each pixel. If
we copy each row using copy-bytes!, that drops the function to 9.55±6.1
msec (!) which is a speedup factor of ~20x over the original.

The problem with that is on my little-endian machine, Cairo expects the
input data in BGRA format, not RGBA, so the colors look wrong. Alas,
this is why Racket's doing all the byte swizzling manually.

Is there a fast native way of switching the endianness of a byte vector
assumed to contain 32-bit ints? Or some way to do what we want?

If there's a way to do this, this could make playing simple
low-resolution videos from Racket pretty feasible.


Posted on the users mailing list.