<span style="font-family:arial,sans-serif;font-size:13px">As part of my work, I frequently have to 'shape' multi-dimensional datasets. This is reasonably easy to do in Racket and I'm thinking about pulling together some of the functions I use into a library. Before I do this though, I was wondering if there is any similar work I can build upon, or perhaps use to guide me.</span><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">As an example of what I mean, I'll receive from a colleague a file like this:</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div>
<div style="font-family:arial,sans-serif;font-size:13px">Date, Site, Total Alkalinity as CaCO3 (mg/L), Carbonate as CaCO3 (mg/L),</div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, BH1, 120, <5<br>
</div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, BH2, 180, <5</div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, BH3, 160, <5</div><div style="font-family:arial,sans-serif;font-size:13px">
26-Oct-12, BH1, 150, <1</div><div style="font-family:arial,sans-serif;font-size:13px">26-Oct-12, BH2, 165, 0</div><div style="font-family:arial,sans-serif;font-size:13px">26-Oct-12, BH3, 180, <5</div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">(This is a laboratory analysis of water sampled from bore holes).</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
This file is composed of two datasets (a set each of total alkalinity and carbonate), with shared dimensions of 'date' and 'site'. I'll often deal with files containing up to 80 datasets.</div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">More often than not, all I'll need to do is 'shape' these datasets into a format that can be pulled into a spreadsheet for further analysis/graphing. One example is:</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">"", Total Alkalinity as CaCO3 (mg/L), Carbonate as CaCO3 (mg/L)</div><div style="font-family:arial,sans-serif;font-size:13px">
BH1</div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, 120, <5</div><div style="font-family:arial,sans-serif;font-size:13px">26-Oct-12, 150, <1</div><div style="font-family:arial,sans-serif;font-size:13px">
BH2</div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, 180, <5</div><div style="font-family:arial,sans-serif;font-size:13px">26-Oct-12, 165, 0</div><div style="font-family:arial,sans-serif;font-size:13px">
BH3</div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, 160, <5</div><div style="font-family:arial,sans-serif;font-size:13px">26-Oct-12, 180, <5</div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">Another example:</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">"", BH1, BH2, BH3</div>
<div style="font-family:arial,sans-serif;font-size:13px">Total Alkalinity as CaCO3 (mg/L)<br></div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, 120, 180, 160</div><div style="font-family:arial,sans-serif;font-size:13px">
26-Oct-12, 150, 165, 180</div><div style="font-family:arial,sans-serif;font-size:13px">Carbonate as CaCO3 (mg/L)<br></div><div style="font-family:arial,sans-serif;font-size:13px">1-Nov-12, <5, <5, <5</div><div style="font-family:arial,sans-serif;font-size:13px">
26-Oct-12, <1, 0, <5</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">As you can see, the recursive nature of these reports makes them ideal for processing with Racket, and although it takes me a little while to get the format of a report right, I usually can add the report to my toolbox for whenever it's needed later.</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">So I've started drafting what I think a good DSL for doing this type of task might be, something like:</div>
<div style="font-family:arial,sans-serif;font-size:13px">(define-dataset</div><div style="font-family:arial,sans-serif;font-size:13px"> (date (date 'dd-MM-yyyy'))</div><div style="font-family:arial,sans-serif;font-size:13px">
(site (text))</div><div style="font-family:arial,sans-serif;font-size:13px"> (parameter (text)) ...)</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
(define-report example1</div><div style="font-family:arial,sans-serif;font-size:13px"> (columns (parameter ...))</div><div style="font-family:arial,sans-serif;font-size:13px"> (rows ((site) date)))</div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">I haven't worked out the details yet, and I'm not sure the above will work the way I want it to. But I've had a quick look at Microsoft's Scientific DataSet (<a href="http://sds.codeplex.com/" target="_blank">http://sds.codeplex.com/</a>), but it lacks the composability I'm used to with Racket. Is anyone aware of any similar work that does this, or that I could use as a guide?</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Thanks,</div><div style="font-family:arial,sans-serif;font-size:13px">Simon.</div>