[plt-scheme] firefox locking up 15 seconds for each plt documentation search

From: Eli Barzilay (eli at barzilay.org)
Date: Sun Jun 7 12:30:30 EDT 2009

On Jun  7, Dave Herman wrote:
> > I'm happy to hear any other ideas people might have.
> 
> I don't know how the search algorithm works, but if the algorithm is
> fairly simple and unchanging, and it's just the index that changes
> frequently, it seems to me a middle ground solution to the problems
> with having two different search implementations would be:
> 
> - build one index
> - from this index, auto-generate the data in two separate formats,
>   one Scheme-friendly, one JS-friendly
> - implement the search algorithm separately in both Scheme and JS
> 
> This is not as over-engineered as, say, compiling the Scheme search
> algorithm to JS, but it should mean the two implementations aren't
> as likely to diverge in behavior since they're both using the same
> index.

The problem is that the complication is not something that can be
folded into an index.  Here are some factors that change the scoring
(it's not really scoring at the moment, but see below):

* If there are exact matches for some "foo-bar", then they should be
  first.

* If there are "foo-bar.*" keywords, they should appear soon after.

* Same goes for keywords that have a "foo" in them (ie, matches for
  "\bfoo\b"), of course having both "foo" and "bar" should count
  higher.

* Next should be matches for "\bfoo".

* Matches from scheme/* are preferred over other modules.

* ... but that's just an approximation, and eventually there should be
  a way to prefer other modules (for example, r5rs should prefer its
  own bindings, but still show others; and OTOH, BSL should show only
  its own matches and perhaps teachpack matches).

* Further complicated by having multiple keywords specified with some
  operatos between them (currently the terms are all `and'ed though).

* And finally there is the "context query" thing which is used to
  pre-filter the vector to search.

As you can see, not only is this not trivial code, it's very fragile
in the sense that it needs to change when there are new surprises.
For example, I never thought of the whole word-boundary thing, until
people complained that if, for example, you're looking for
"copy-vector", then you should see the match for "vector-copy!".

(And of course that implies that Scheme would be much better to do all
of this, since then I can have a testsuite in the form of checking
such expectations whenever there are changes to the rules, weights,
orderings, etc.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.