[plt-scheme] Re: More PLT Scheme as an Alternative to Matlab

From: Doug Williams (m.douglas.williams at gmail.com)
Date: Sat Aug 15 12:14:36 EDT 2009

>
> I normally just lurk on this list (I'm only a novice with Scheme), but
> I figured I'd crawl out of hiding for this topic.  (I tried to post a
> similar message last night but didn't realize I couldn't post through
> Google groups without being subscribed directly to the list.)


Thanks for joining in.

>
> > [...] arrays have an order, which may be 'row or 'column.
> >
>
> When you implement an array transpose for more than two dimensions
> (basically swapping two axes), and if you don't require a copy of the
> array, you can easily end up with arrays that are neither 'row nor
> 'column major.


There are operations that require a copy and there is no way I know of
around it.


> Similarly, you'll probably want to support striding or reversing an
> array without making a copy under the hood.  For example, to pull out
> a sub-array, you might create a "take" operator like such:
>
>    (take some-array #:start 3000 #:stop 1000 #:step -2)
>

> The actual syntax could be part of your array-ref function, and the
> actual parameters could be "to" and "from" and "by" etc.., but in any
> case the result will be a copy or it won't be either row or column
> major.  Also note that a one dimensional array is *both* row and
> column major (Fortran and C will both be happy with it).


I was thinking a syntax like (<start> [<stop> [<step>]]) do (array-ref
some-array '((* * 3) 4)) would be the 5th column of every 4th row of some 2d
array. I don't have a problem with negative steps. I'll have to play around
and see what negative strides etc will work out, but I'm sure some copies
may be required. Yes, for 1-dimensional arrays it doesn't matter what the
order is - and it is its own transpose.

One thing that always frustrated me about NumPy is that you don't
> really know whether you've gotten a copy or a view for many
> operations.  Sometimes it is safe to modify the copy, but other times
> you're trashing the original data in a view, and the same piece of
> code can do either based on runtime arguments to the function you're
> calling.  I believe one correct thing to do is to implement copy-on-
> write for any shape changing operation, but that's trickier.  Treating
> arrays as immutable is attractive (especially in a functional
> programming language), and an efficient implementation for that is
> probably even more complicated.  Noel Welsh mentioned the SAC
> language, and it looks like they would be a good place from which to
> get ideas.


I am leaning toward immutability, so it shouldn't matter if it is a copy or
not.

Another frustrating thing in NumPy is the topic of zero-dimensional
> arrays (scalars) and how they integrate with the rest of the language
> and libraries.  Does subscripting an N-dimensional array by N
> subscripts give you a Scheme numeric value, or a zero dimensional
> array?  If it's a standard Scheme value, can you broadcast (or "new
> axis") a Scheme scalar value back to an array.  Will it change it's
> type when you do this?  Can you subscript with zero subscripts?


My current implementation has a schizophrenic array-ref, if a reference is
fully qualified (i.e., values specified for all the dimensions), it returns
a Scheme scalar value, otherwise it returns a new array object to the
referenced subarray (slice). It subarray shares the parent array's data if
possible, otherwise a new data vector is made for the subarray. I will use
broadcasting - so a scalar can look like any shape array.

It looks like this is also covered in the SAC language, and they seem
> to have a really nice way to do things, but to completely fit that way
> into Scheme might require that Scheme think of it's numeric values as
> zero dimensional arrays.  A special PLT language could integrate
> arrays into the numeric tower, but that's probably not what you had in
> mind for your library.  There is an unfortunate compromise to make in
> there somewhere.


As a library, I can't control the Scheme side of things. [But, I've found
the PLT Scheme implementators like to do the 'right thing' and they do
listen.] So, comprimises at the boundary of the array collection and Scheme
will be required.

>
>
>
> Similarly, when you do an element-wise multiplication of two arrays,
> you'll probably promote to a common super type (multiplying "signed 16
> bit" by "32 bit floating point" should probably give you "32 bit
> floating point").  What should you promote to when you multiply an
> array by a Scheme scalar floating point or integer value?  Will you
> support in-place multiplication, and will it give you the same type?


One of the comprimises I am making is that all computations will be done in
native Scheme code using the Scheme numeric tower. I'll only coerce things
at the array storage level. Indeed, the default array uses Scheme objects
and the array library won't do any coercions at all. The operations don't
even have to be numeric.

>
>
> There are lots of questions like that, and different people will give
> you different answers about the "right" way to do things...  I have my
> own opinions, but they aren't more authoritative than any other.


Let's try shared authority and collaboration at this point.


> >
> > The types are object, u8, u16, u32, u64, s8, s16, s32, s64, f32, f64,
> c64, and c128.
> >
>
> In addition to the types you list above, I would personally benefit
> from complex integer formats.  We use NumPy at work, and it is very
> inconvenient that we have to declare an N-by-2 array (translated to
> your syntax):
>
>        (make-array (list N 2) #:type s16)
>
> When what we really want is a one dimensional array of N complex 16
> bit integers:
>
>        (make-array (list N) #:type cs16)


Most of the type names above (and certainly the style of them) are from the
SRFI 4 library. One of my next questions to the list was whether to keep
this or use the C-vector style. I like this one for conciseness. I also like
your suggestion below for cf32 and cf64 instead of c64 and c128. It does
allow extending to things like cs32, for example.

If your library takes off, my guess is that someone will ask you for
> quaternions and arrays of structures (RGBA pixels for instance)
> too.  :-)


I was already planning on quaternions (since I do a lot of simulation work).
Of course you can have arrays of anything you want using object. We should
make it extensible by exposing the vtype abstraction in a useful manner.

(As a silly style suggestion, I think you should call your complex
> floating point types cf32 and cf64 instead of c64 and c128.)


I rather like this and will do that now.

I think other items will eventually crop up too... For instance,
> you'll probably want to be able to read data off disk or serialize it
> across a network.  In this case, you'll need to keep track of the
> endianess.  (Thankfully VAX floating point is mostly dead. :-)


I'll probably extend the packed-binary package (
http://planet.plt-scheme.org/display.ss?package=packed-binary.plt&owner=williams)
for that.

This thread has already started discussing how the memory for typed
> arrays could be handled specially with respect to the garbage
> collector.  Someone will want to process huge arrays with memory
> mapped files (larger than can possibly fit in RAM).  I think you'll
> find that eventually you want to implement your arrays on top of a
> generic "buffer of bytes" type like R6RS bytevectors.


But, that is one for the future, I think.


> I sincerely hope that none of this discourages you!  I just wanted to
> point out a few details I think you'll run into sooner or later.
> Maybe that will spare you some of the backwards compatibility baggage
> that other array libraries are stuck with.


I was aware of most of them, but it helps having someone to keep me honest.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20090815/7f0a6424/attachment.html>

Posted on the users mailing list.