It's interesting that if I use (in-vector ...) in the for/fold statements, the times for the for/fold version are about the same as for the (uglier) do version (with vector-refs). [This one probably would benefit from Matthew's performance improvements.] Actually using it would mean giving up the flexibility in going to sequences in the first place, but it means there is some hope of eventually getting the same performance for the sequence versions (at least for vectors).<br>
<br>using in-vector in the for<br>cpu time: 266 real time: 265 gc time: 0<br>cpu time: 250 real time: 250 gc time: 47<br><br>current science collection routines<br>cpu time: 250 real time: 249 gc time: 0<br>cpu time: 218 real time: 218 gc time: 16<br>
<br>It would be nice if (for ((x some-vector)) ...) and (for ((x (in-vector some-vector))) ...) had similar performance. I realize that at expansion time the latter knows to expect a vector while the former does not and can generate code accordingly. But, I can dream.<br>
<br><div class="gmail_quote">On Wed, Sep 9, 2009 at 9:37 AM, Doug Williams <span dir="ltr"><<a href="mailto:m.douglas.williams@gmail.com">m.douglas.williams@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Thanks for running them for me. I guess it comes down to whether the flexibility is worth the performance hit. I like the flexibility. In the past there were times I have had to convert lists to vectors just to compute statistics on them, which is even less efficient. I could include the old ones as vector-mean, vector-variance, etc for people who need/want the performance.<br>
<font color="#888888">
<br>Doug</font><div><div></div><div class="h5"><br><br><div class="gmail_quote">On Wed, Sep 9, 2009 at 9:29 AM, Matthew Flatt <span dir="ltr"><<a href="mailto:mflatt@cs.utah.edu" target="_blank">mflatt@cs.utah.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I don't think the latest changes will affect the performance, since<br>
unsafe operations are only used for `in-vector' and (sometimes)<br>
`in-range' when they appear immediately in a `for' right-hand side.<br>
<br>
Times on my machine:<br>
<br>
New<br>
laptop% mzscheme time-statistics.ss<br>
cpu time: 576 real time: 578 gc time: 11<br>
cpu time: 450 real time: 451 gc time: 10<br>
(that's without `in-vector'; times using `in-vector' are the same)<br>
<br>
Old<br>
laptop% mzscheme time-statistics.ss<br>
cpu time: 233 real time: 237 gc time: 18<br>
cpu time: 196 real time: 198 gc time: 10<br>
<div><div></div><div><br>
At Wed, 9 Sep 2009 09:18:55 -0600, Doug Williams wrote:<br>
> I've reimplemented the statistics module from the science collection to use<br>
> sequences instead of just vectors. I like the generality better - I can use<br>
> any sequence (e.g., vector or list) - but there is more of performance hit<br>
> than I would have liked. I haven't timed it with the new changes that<br>
> Matthew just put it. The good news is that there isn't much of a hit for<br>
> using (variance data) as opposed to (variance (in-vector data)) and there<br>
> isn't a huge hit for using the contract that ensures that the sequence is a<br>
> sequence of real numbers.<br>
><br>
> I created a 100000 element vector and timed a loop getting the variance of<br>
> the elements 10 times. Note that I create an executable that runs compiled<br>
> code in both cases. [Runs of the sequence code within DrScheme are about<br>
> twice the times of the compiled code - I assume they run from byte code in<br>
> that case. Runs of the science collection code is about the same in DrScheme<br>
> - I assume they run the compiled code.]<br>
><br>
> Times using sequences [primarily using 'for/fold' for sequencing and<br>
> referencing]:<br>
><br>
> (variance data) : cpu time: 625 real time: 625 gc time: 32<br>
> (unchecked-variance data) : cpu time: 531 real time: 531 gc time: 77<br>
><br>
> (variance (in-vector data)) : cpu time: 609 real time: 609 gc time: 16<br>
> (unchecked-variance (in-vector data)) : cpu time: 485 real time: 484 gc<br>
> time: 0<br>
><br>
> Times using vectors (current science collection routines) [primarily using<br>
> 'do' for sequencing with 'vector-ref' for referencing]:<br>
><br>
> (variance data) : cpu time: 235 real time: 234 gc time: 16<br>
> (unchecked-variance data) : cpu time: 187 real time: 188 gc time: 46<br>
><br>
> All of the normal caveats about timing values apply - just because I'm<br>
> timing a statistics routine doesn't been it's statistically relevant :).<br>
><br>
> I will retime them when there is a nightly build with Matthew's performance<br>
> improvements is available (it seems that 4.2.1.7 from Saturday is the<br>
> latest) - or I build it on my machine at home. I don't have the development<br>
> tools on my laptop to build from svn.<br>
><br>
> I've attached the files in case anyone wants to look them over. If someone<br>
> could run them against the latest svn, it would be nice. [<br>
><br>
> Comments from anyone that uses these routines from the science collection<br>
> would be most welcome.<br>
><br>
> Doug<br>
<br>
</div></div></blockquote></div><br>
</div></div></blockquote></div><br>