Thanks for running them for me. I guess it comes down to whether the flexibility is worth the performance hit. I like the flexibility. In the past there were times I have had to convert lists to vectors just to compute statistics on them, which is even less efficient. I could include the old ones as vector-mean, vector-variance, etc for people who need/want the performance.<br>
<br>Doug<br><br><div class="gmail_quote">On Wed, Sep 9, 2009 at 9:29 AM, Matthew Flatt <span dir="ltr"><<a href="mailto:mflatt@cs.utah.edu">mflatt@cs.utah.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I don't think the latest changes will affect the performance, since<br>
unsafe operations are only used for `in-vector' and (sometimes)<br>
`in-range' when they appear immediately in a `for' right-hand side.<br>
<br>
Times on my machine:<br>
<br>
New<br>
laptop% mzscheme time-statistics.ss<br>
cpu time: 576 real time: 578 gc time: 11<br>
cpu time: 450 real time: 451 gc time: 10<br>
(that's without `in-vector'; times using `in-vector' are the same)<br>
<br>
Old<br>
laptop% mzscheme time-statistics.ss<br>
cpu time: 233 real time: 237 gc time: 18<br>
cpu time: 196 real time: 198 gc time: 10<br>
<div><div></div><div class="h5"><br>
At Wed, 9 Sep 2009 09:18:55 -0600, Doug Williams wrote:<br>
> I've reimplemented the statistics module from the science collection to use<br>
> sequences instead of just vectors. I like the generality better - I can use<br>
> any sequence (e.g., vector or list) - but there is more of performance hit<br>
> than I would have liked. I haven't timed it with the new changes that<br>
> Matthew just put it. The good news is that there isn't much of a hit for<br>
> using (variance data) as opposed to (variance (in-vector data)) and there<br>
> isn't a huge hit for using the contract that ensures that the sequence is a<br>
> sequence of real numbers.<br>
><br>
> I created a 100000 element vector and timed a loop getting the variance of<br>
> the elements 10 times. Note that I create an executable that runs compiled<br>
> code in both cases. [Runs of the sequence code within DrScheme are about<br>
> twice the times of the compiled code - I assume they run from byte code in<br>
> that case. Runs of the science collection code is about the same in DrScheme<br>
> - I assume they run the compiled code.]<br>
><br>
> Times using sequences [primarily using 'for/fold' for sequencing and<br>
> referencing]:<br>
><br>
> (variance data) : cpu time: 625 real time: 625 gc time: 32<br>
> (unchecked-variance data) : cpu time: 531 real time: 531 gc time: 77<br>
><br>
> (variance (in-vector data)) : cpu time: 609 real time: 609 gc time: 16<br>
> (unchecked-variance (in-vector data)) : cpu time: 485 real time: 484 gc<br>
> time: 0<br>
><br>
> Times using vectors (current science collection routines) [primarily using<br>
> 'do' for sequencing with 'vector-ref' for referencing]:<br>
><br>
> (variance data) : cpu time: 235 real time: 234 gc time: 16<br>
> (unchecked-variance data) : cpu time: 187 real time: 188 gc time: 46<br>
><br>
> All of the normal caveats about timing values apply - just because I'm<br>
> timing a statistics routine doesn't been it's statistically relevant :).<br>
><br>
> I will retime them when there is a nightly build with Matthew's performance<br>
> improvements is available (it seems that 4.2.1.7 from Saturday is the<br>
> latest) - or I build it on my machine at home. I don't have the development<br>
> tools on my laptop to build from svn.<br>
><br>
> I've attached the files in case anyone wants to look them over. If someone<br>
> could run them against the latest svn, it would be nice. [<br>
><br>
> Comments from anyone that uses these routines from the science collection<br>
> would be most welcome.<br>
><br>
> Doug<br>
<br>
</div></div></blockquote></div><br>