[racket-dev] String search in mred/private/wxme: knuth-bendix?!

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Thu Nov 29 20:27:37 EST 2012

On Thu, Nov 29, 2012 at 5:15 PM, Matthew Flatt <mflatt at cs.utah.edu> wrote:
>> Assuming that it is KMP, is there a reason why we're not using Boyer-Moore
>> here instead?  My understanding was BM was faster than KMP for common
>> situations.

IIUC, the BM speedup comes from skipping over portions of the text. In
our case, that could be handy when you skip entire snips, but if you
don't skip them, then I think it wouldn't really help at all. And I
think you skip things that are at most the length of the search
string, so it won't help that much in practice for DrRacket. ... I
think.

> KMP is just the algorithm I knew at the time.
>
> Since `find-string' can find "editor:info-mixin searching%" at the end
> of "collects/frameworks/private/text.rkt" in 15-20ms, I doubt that it
> has been a bottleneck for, say, searching in DrRacket.

I do agree that the searching algorithm doesn't seem to be a limiting
factor, but I don't think this analysis is quite right. Two points: I
assume you did this test with a text% object. In DrRacket, of course,
it colors the buffer, which means 3.5 times as many snips, which seems
to take about twice as long (I believe that if every snip is one
character things can get really bad but that's not common). Still not
too bad, but the other is that when someone types something in
DrRacket, it searches interactively and it finds all of the results,
not just the first hit. I can get this to take in 100s of milliseconds
on big files if I type things like a space, which starts to matter.

Of course, since I've been in that kind of mood, DrRacket can now
break these callbacks up, it shouldn't be a problem in practice.
(There is still some hidden sluggishness in DrRacket surrounding
searching tho. I think. I haven't been able to put my finger on it
yet.)

Robby

Posted on the dev mailing list.