[racket] DSL for multi-dimensional datasets?

From: Simon Haines (simon.haines at con-amalgamate.net)
Date: Mon Nov 5 23:22:49 EST 2012

As part of my work, I frequently have to 'shape' multi-dimensional
datasets. This is reasonably easy to do in Racket and I'm thinking about
pulling together some of the functions I use into a library. Before I do
this though, I was wondering if there is any similar work I can build upon,
or perhaps use to guide me.

As an example of what I mean, I'll receive from a colleague a file like
this:

Date, Site, Total Alkalinity as CaCO3 (mg/L), Carbonate as CaCO3 (mg/L),
1-Nov-12, BH1, 120, <5
1-Nov-12, BH2, 180, <5
1-Nov-12, BH3, 160, <5
26-Oct-12, BH1, 150, <1
26-Oct-12, BH2, 165, 0
26-Oct-12, BH3, 180, <5

(This is a laboratory analysis of water sampled from bore holes).

This file is composed of two datasets (a set each of total alkalinity and
carbonate), with shared dimensions of 'date' and 'site'. I'll often deal
with files containing up to 80 datasets.

More often than not, all I'll need to do is 'shape' these datasets into a
format that can be pulled into a spreadsheet for further analysis/graphing.
One example is:

"", Total Alkalinity as CaCO3 (mg/L), Carbonate as CaCO3 (mg/L)
BH1
1-Nov-12, 120, <5
26-Oct-12, 150, <1
BH2
1-Nov-12, 180, <5
26-Oct-12, 165, 0
BH3
1-Nov-12, 160, <5
26-Oct-12, 180, <5

Another example:

"", BH1, BH2, BH3
Total Alkalinity as CaCO3 (mg/L)
1-Nov-12, 120, 180, 160
26-Oct-12, 150, 165, 180
Carbonate as CaCO3 (mg/L)
1-Nov-12, <5, <5, <5
26-Oct-12, <1, 0, <5

As you can see, the recursive nature of these reports makes them ideal for
processing with Racket, and although it takes me a little while to get the
format of a report right, I usually can add the report to my toolbox for
whenever it's needed later.

So I've started drafting what I think a good DSL for doing this type of
task might be, something like:
(define-dataset
  (date (date 'dd-MM-yyyy'))
  (site (text))
  (parameter (text)) ...)

(define-report example1
  (columns (parameter ...))
  (rows ((site) date)))

I haven't worked out the details yet, and I'm not sure the above will work
the way I want it to. But I've had a quick look at Microsoft's Scientific
DataSet (http://sds.codeplex.com/), but it lacks the composability I'm used
to with Racket. Is anyone aware of any similar work that does this, or that
I could use as a guide?

Thanks,
Simon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20121106/73ff7306/attachment-0001.html>

Posted on the users mailing list.