[plt-scheme] Re: More PLT Scheme as an Alternative to Matlab

From: Scott (xscottg at gmail.com)
Date: Sat Aug 15 00:23:05 EDT 2009

On Aug 13, 7:30 pm, Doug Williams <m.douglas.willi... at gmail.com>
wrote:
>
> I think the first thing we need to do is to define such a multidimensional
> array representation for PLT Scheme.
>

I normally just lurk on this list (I'm only a novice with Scheme), but
I figured I'd crawl out of hiding for this topic.  (I tried to post a
similar message last night but didn't realize I couldn't post through
Google groups without being subscribed directly to the list.)

First, let me say how great I think it would be to have a library like
this.  I've thought a lot about array libraries over the last few
years, and I would love to have one for PLT Scheme.  (All of that to
say, "thank you, and please take all of my comments below as
supportive of your goal!"  :-)


>
> [...] arrays have an order, which may be 'row or 'column.
>

When you implement an array transpose for more than two dimensions
(basically swapping two axes), and if you don't require a copy of the
array, you can easily end up with arrays that are neither 'row nor
'column major.

Similarly, you'll probably want to support striding or reversing an
array without making a copy under the hood.  For example, to pull out
a sub-array, you might create a "take" operator like such:

    (take some-array #:start 3000 #:stop 1000 #:step -2)

The actual syntax could be part of your array-ref function, and the
actual parameters could be "to" and "from" and "by" etc.., but in any
case the result will be a copy or it won't be either row or column
major.  Also note that a one dimensional array is *both* row and
column major (Fortran and C will both be happy with it).

Regardless, it's a straightforward predicate to determine if an array
is row major or column major from its other details (shape, strides,
and offset).  (I mention zero dimensional arrays below, but whether a
zero dimensional array is row major or not is probably a koan. :-)


One thing that always frustrated me about NumPy is that you don't
really know whether you've gotten a copy or a view for many
operations.  Sometimes it is safe to modify the copy, but other times
you're trashing the original data in a view, and the same piece of
code can do either based on runtime arguments to the function you're
calling.  I believe one correct thing to do is to implement copy-on-
write for any shape changing operation, but that's trickier.  Treating
arrays as immutable is attractive (especially in a functional
programming language), and an efficient implementation for that is
probably even more complicated.  Noel Welsh mentioned the SAC
language, and it looks like they would be a good place from which to
get ideas.


Another frustrating thing in NumPy is the topic of zero-dimensional
arrays (scalars) and how they integrate with the rest of the language
and libraries.  Does subscripting an N-dimensional array by N
subscripts give you a Scheme numeric value, or a zero dimensional
array?  If it's a standard Scheme value, can you broadcast (or "new
axis") a Scheme scalar value back to an array.  Will it change it's
type when you do this?  Can you subscript with zero subscripts?

It looks like this is also covered in the SAC language, and they seem
to have a really nice way to do things, but to completely fit that way
into Scheme might require that Scheme think of it's numeric values as
zero dimensional arrays.  A special PLT language could integrate
arrays into the numeric tower, but that's probably not what you had in
mind for your library.  There is an unfortunate compromise to make in
there somewhere.


Similarly, when you do an element-wise multiplication of two arrays,
you'll probably promote to a common super type (multiplying "signed 16
bit" by "32 bit floating point" should probably give you "32 bit
floating point").  What should you promote to when you multiply an
array by a Scheme scalar floating point or integer value?  Will you
support in-place multiplication, and will it give you the same type?

There are lots of questions like that, and different people will give
you different answers about the "right" way to do things...  I have my
own opinions, but they aren't more authoritative than any other.


>
> The types are object, u8, u16, u32, u64, s8, s16, s32, s64, f32, f64, c64, and c128.
>

In addition to the types you list above, I would personally benefit
from complex integer formats.  We use NumPy at work, and it is very
inconvenient that we have to declare an N-by-2 array (translated to
your syntax):

        (make-array (list N 2) #:type s16)

When what we really want is a one dimensional array of N complex 16
bit integers:

        (make-array (list N) #:type cs16)

If your library takes off, my guess is that someone will ask you for
quaternions and arrays of structures (RGBA pixels for instance)
too.  :-)

(As a silly style suggestion, I think you should call your complex
floating point types cf32 and cf64 instead of c64 and c128.)


I think other items will eventually crop up too... For instance,
you'll probably want to be able to read data off disk or serialize it
across a network.  In this case, you'll need to keep track of the
endianess.  (Thankfully VAX floating point is mostly dead. :-)

This thread has already started discussing how the memory for typed
arrays could be handled specially with respect to the garbage
collector.  Someone will want to process huge arrays with memory
mapped files (larger than can possibly fit in RAM).  I think you'll
find that eventually you want to implement your arrays on top of a
generic "buffer of bytes" type like R6RS bytevectors.


I sincerely hope that none of this discourages you!  I just wanted to
point out a few details I think you'll run into sooner or later.
Maybe that will spare you some of the backwards compatibility baggage
that other array libraries are stuck with.


Cheers,
    -Scott



Posted on the users mailing list.