[racket-dev] Potential search improvement
On May 29, 2012, at 11:53 AM, Eli Barzilay wrote:
> Three hours ago, Sam Tobin-Hochstadt wrote:
>> On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay <eli at barzilay.org> wrote:
>>> Just now, Sam Tobin-Hochstadt wrote:
>>>> I think you probably want to rank/divide '1' here based on how
>>>> much of the identifier is matched by the search. For example, if
>>>> you search for 'current-sep-line', you probably want
>>>> 'current-line-sep' first, but currently you get
>>>> 'current-alist-line-sep' first.
>> [...]
>>
>> Getting away from the discussion on sorting speed, I don't think my
>> suggestion even requires sorting: just add a 1.5 for
>> match-all-subword-parts-to-whole-id.
>
> That won't work, since "current-line-sep" will have the all-subword
> match for both entries. The first one is whatever comes first in the
> alphabetically sorted index. You can see the same problem with a
> search for "current sep line".
I thought Sam's original suggestion was, when you get an all-subword match, you weight by the ratio of the matched length to the whole-entry length? Thus in the example in question, "current-line-sep" would get a weight of 1.0 but "current-alist-line-sep" only 14/19=0.74. (Or something like that, depending on how you count the hyphens.) Still doesn't require any sorting, and the precise numbers don't matter, only their ordering.
Stephen Bloch
sbloch at adelphi.edu