Thanks for running them for me. I guess it comes down to whether the flexibility is worth the performance hit. I like the flexibility. In the past there were times I have had to convert lists to vectors just to compute statistics on them, which is even less efficient. I could include the old ones as vector-mean, vector-variance, etc for people who need/want the performance. Doug <div class="gmail_quote">On Wed, Sep 9, 2009 at 9:29 AM, Matthew Flatt <<a href="mailto:mflatt@cs.utah.edu">mflatt@cs.utah.edu</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> I don't think the latest changes will affect the performance, since unsafe operations are only used for `in-vector' and (sometimes) `in-range' when they appear immediately in a `for' right-hand side. Times on my machine: New laptop% mzscheme time-statistics.ss cpu time: 576 real time: 578 gc time: 11 cpu time: 450 real time: 451 gc time: 10 (that's without `in-vector'; times using `in-vector' are the same) Old laptop% mzscheme time-statistics.ss cpu time: 233 real time: 237 gc time: 18 cpu time: 196 real time: 198 gc time: 10 <div><div></div><div class="h5"> At Wed, 9 Sep 2009 09:18:55 -0600, Doug Williams wrote: > I've reimplemented the statistics module from the science collection to use > sequences instead of just vectors. I like the generality better - I can use > any sequence (e.g., vector or list) - but there is more of performance hit > than I would have liked. I haven't timed it with the new changes that > Matthew just put it. The good news is that there isn't much of a hit for > using (variance data) as opposed to (variance (in-vector data)) and there > isn't a huge hit for using the contract that ensures that the sequence is a > sequence of real numbers. > > I created a 100000 element vector and timed a loop getting the variance of > the elements 10 times. Note that I create an executable that runs compiled > code in both cases. [Runs of the sequence code within DrScheme are about > twice the times of the compiled code - I assume they run from byte code in > that case. Runs of the science collection code is about the same in DrScheme > - I assume they run the compiled code.] > > Times using sequences [primarily using 'for/fold' for sequencing and > referencing]: > > (variance data) : cpu time: 625 real time: 625 gc time: 32 > (unchecked-variance data) : cpu time: 531 real time: 531 gc time: 77 > > (variance (in-vector data)) : cpu time: 609 real time: 609 gc time: 16 > (unchecked-variance (in-vector data)) : cpu time: 485 real time: 484 gc > time: 0 > > Times using vectors (current science collection routines) [primarily using > 'do' for sequencing with 'vector-ref' for referencing]: > > (variance data) : cpu time: 235 real time: 234 gc time: 16 > (unchecked-variance data) : cpu time: 187 real time: 188 gc time: 46 > > All of the normal caveats about timing values apply - just because I'm > timing a statistics routine doesn't been it's statistically relevant :). > > I will retime them when there is a nightly build with Matthew's performance > improvements is available (it seems that 4.2.1.7 from Saturday is the > latest) - or I build it on my machine at home. I don't have the development > tools on my laptop to build from svn. > > I've attached the files in case anyone wants to look them over. If someone > could run them against the latest svn, it would be nice. [ > > Comments from anyone that uses these routines from the science collection > would be most welcome. > > Doug </div></div></blockquote></div>