[racket-dev] Potential search improvement

From: Michael Wilber (mwilber at uccs.edu)
Date: Tue May 29 15:05:01 EDT 2012

I just noticed something about the way I use search.

Just now, I wanted to find the reference docs that describe threads. So
I typed "thread" into the search box and got documentation about the
(thread ...) form, in both the old and new search pages.

But the reference describes "Threads", not "thread", so as soon as I
thought to add the final "s" to my query, the new search immediately
pulled the correct document up. (This is a big improvement on the old
search, which presented me with "call-with-killing-threads" and
"callbacks for blocked threads" and so forth.)

Should search take such common suffixes into account?

On Tue, 29 May 2012 07:17:16 -0400, Eli Barzilay <eli at barzilay.org> wrote:
> I have made a possibly useful improvement to the JS search code.
> It's not pushed, yet, but I dropped the revised JS code on the
> pre-built pages so you can try it out here:
>
>   http://pre.racket-lang.org/docs/html/search/
>
> and compare searches with the usual page:
>
>   http://docs.racket-lang.org/search/
>
> I'd appreciate people playing with it to find about potential problems
> with the ordering and possibly with different browsers.
>
>
> ** More about the change (especially if you want to try to improve
>    things):
>
> This is not real ranking, but it should give better results overall.
> The thing is that the search assigns a small integer "score" for each
> term, where the scores are (roughly)
>
>   0 no match,
>   1 match-all-subword-parts,
>   2 contains a match,
>   3 matches a prefix,
>   4 exact match.
>
> The thing is that they used to be lumped to 2 groups with exact
> matches first.  Now I made each of these be in its own group, so
> there's a little more order.  To see an example that works nicely now
> try "splay".
>
> This doesn't solve all problems...  To see problematic things (that
> Neil has complained about in the past) try:
>
>   * "port" (gives precedence for exact matches, but the reference
>     entries are better; better now with the chapters appearing right
>     after the exact binding matches).
>
>   * "fold" (same problem, where it could be argued that for most
>     people "foldl" from `racket/base' is better than "fold" from the
>     DMdA languages and `srfi/1').
>
> Some of the problem comes from having no preferences for the results.
> Such preferences are not hard to implement, but they connect two
> unrelated pieces of code (the score assignments in the JS search, and
> the bonus for each manual) and it can quickly get into sticky
> questions.
>
> Another aspect of the problem is that there's N search terms, not just
> one.  Currently, the score for each is combined with a `min'; a `max'
> tends to be worse.  Ideally, it would use an average, but that would
> require to actually sort the results.
>
> --
>           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                     http://barzilay.org/                   Maze is Life!
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev

Posted on the dev mailing list.