[racket-dev] Potential search improvement

From: Eli Barzilay (eli at barzilay.org)
Date: Tue May 29 07:17:16 EDT 2012

I have made a possibly useful improvement to the JS search code.
It's not pushed, yet, but I dropped the revised JS code on the
pre-built pages so you can try it out here:

  http://pre.racket-lang.org/docs/html/search/

and compare searches with the usual page:

  http://docs.racket-lang.org/search/

I'd appreciate people playing with it to find about potential problems
with the ordering and possibly with different browsers.


** More about the change (especially if you want to try to improve
   things):

This is not real ranking, but it should give better results overall.
The thing is that the search assigns a small integer "score" for each
term, where the scores are (roughly)

  0 no match,
  1 match-all-subword-parts,
  2 contains a match,
  3 matches a prefix,
  4 exact match.

The thing is that they used to be lumped to 2 groups with exact
matches first.  Now I made each of these be in its own group, so
there's a little more order.  To see an example that works nicely now
try "splay".

This doesn't solve all problems...  To see problematic things (that
Neil has complained about in the past) try:

  * "port" (gives precedence for exact matches, but the reference
    entries are better; better now with the chapters appearing right
    after the exact binding matches).

  * "fold" (same problem, where it could be argued that for most
    people "foldl" from `racket/base' is better than "fold" from the
    DMdA languages and `srfi/1').

Some of the problem comes from having no preferences for the results.
Such preferences are not hard to implement, but they connect two
unrelated pieces of code (the score assignments in the JS search, and
the bonus for each manual) and it can quickly get into sticky
questions.

Another aspect of the problem is that there's N search terms, not just
one.  Currently, the score for each is combined with a `min'; a `max'
tends to be worse.  Ideally, it would use an average, but that would
require to actually sort the results.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.