[racket-dev] Potential search improvement

From: Stephen Bloch (bloch at adelphi.edu)
Date: Tue May 29 13:25:20 EDT 2012

On May 29, 2012, at 11:53 AM, Eli Barzilay wrote:

> Three hours ago, Sam Tobin-Hochstadt wrote:
>> On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay <eli at barzilay.org> wrote:
>>> Just now, Sam Tobin-Hochstadt wrote:
>>>> I think you probably want to rank/divide '1' here based on how
>>>> much of the identifier is matched by the search.  For example, if
>>>> you search for 'current-sep-line', you probably want
>>>> 'current-line-sep' first, but currently you get
>>>> 'current-alist-line-sep' first.
>> [...]
>> 
>> Getting away from the discussion on sorting speed, I don't think my
>> suggestion even requires sorting: just add a 1.5 for
>> match-all-subword-parts-to-whole-id.
> 
> That won't work, since "current-line-sep" will have the all-subword
> match for both entries.  The first one is whatever comes first in the
> alphabetically sorted index.  You can see the same problem with a
> search for "current sep line".

I thought Sam's original suggestion was, when you get an all-subword match, you weight by the ratio of the matched length to the whole-entry length?  Thus in the example in question, "current-line-sep" would get a weight of 1.0 but "current-alist-line-sep" only 14/19=0.74.  (Or something like that, depending on how you count the hyphens.)  Still doesn't require any sorting, and the precise numbers don't matter, only their ordering.


Stephen Bloch
sbloch at adelphi.edu



Posted on the dev mailing list.