[plt-scheme] firefox locking up 15 seconds for each plt documentation search

From: Eli Barzilay (eli at barzilay.org)
Date: Sun Jun 7 07:20:46 EDT 2009

On Jun  7, Noel Welsh wrote:
> On Sat, Jun 6, 2009 at 10:48 PM, Neil Van Dyke<neil at neilvandyke.org> wrote:
> > Does anyone else's entire Firefox 3 lock up completely for around
> > 15 seconds *every* time they either load the PLT "Search Manuals"
> > or enter a query in the "...search manuals..." field that's at the
> > top of each page?
> 
> I assume this is the index loading. I keep one tab open on the
> search page, and open results in other tabs. This is much faster.

FWIW, I do the same.

[Read on if you're interested in trying a possible solution, otherwise
you can safely ignore this.]

I know that the search page is slow to load -- but I didn't know what
caused the unreasonable freezes that Neil was having.  I don't use
firebug, and looking at the kind of debugging it does, it might do
some errortrace-like annotations that would make the index load much
slower.  This might be made faster if the index is stored as a giant
string instead of a giant arraw -- but parsing that string after it's
loaded will probably mean a 2x slowdown or worse.  (And I won't be
surprised if browsers decide that a 3mb string means that the script
is malicious.)

As for trying to speed it, I can't think of any new tricks to do that.
If you look at "doc/search/plt-index.js" you'll see that there are
several encodings used -- most urls are shortened if they're in the
expected place, and the many <span>s are encoded as vectors.  And to
make things more fun, the PLT index keeps growing (in v4.1 it was
2.5M, and now it's close to 3M -- and that's only for the core tree).

Together with the *many* problems of IE with local-file-scripts, it
might make more sense to switch the whole thing to use an alternative
approach -- an obvious example is to not include the docs in the
distribution and instead use the on-line help, which will make it
possible to use server-side scripts.  (I'm not sure if it's feasible
for the server do deal with that kind of load though, and this comes
in addition to the usual problem of people with no network or with
firewall-limited connections.)

Other options:

* Something that was tried in the past is a local "help" webserver.
  In a sense, this kind of a solution would be ideal, since it lets
  Scheme do the searching, making it possible to get rid of the
  harder-to-write-and-debug JS code, and some of the associated hacks.
  But there are problem with this:
  - The browser is independent from DrScheme, so such a server needs
    to keep running.  This makes the instructions of how to use the
    whole thing more complicated -- since there should be some way to
    stop the background running server, etc.
  - Such a process will not have a small footprint, so it will be
    problematic for smaller machines.
  - Both of these problems might be hacked around somehow by making
    DrScheme run the server -- and on the HTML side do some JS hacking
    that will detect when the server is not there, and show a message
    saying that you need to have DrScheme running.
  - The next thing to tackle is multiple DrSchemes, possibly with
    different versions.  This might be done by making the port depend
    on the version, but it still means that if I'm unfortunate to run
    DrScheme first on a dept server, then my process will do the
    searching for everyone else.
  - And to make things more fun -- there's also `plt-help', and just
    typing `help' in mzscheme.  The first can just leave itself
    running, telling the user about it.  In mzscheme, `help' could try
    to contact a running server and start its own if needed (similarly
    to the DrScheme situation, so the code will be shared).
  - And in addition there's a good number of smaller problems, like
    not being able to start a server due to some OS problems (eg, a
    different process listening on the same port, firewall issues,
    etc).

* Yet another wild idea that was considered is to embed some browser.
  For example, include some browser with PLT (obviously, very bad), or
  use some OS hooks to do so (obviously, very difficult given the
  different platforms, and dealing with different browsers on each
  platform, and trying to get Windows to not just use IE to be polite
  etc).

* Another possible *partial* solution is to make `F1' in DrScheme (as
  well as `plt-help' and MzScheme's `help') do the searching itself.
  This is certainly doable -- and was even the way it worked before
  the JS code was written.  It would have the nice effect that you
  "pay" for the JS page only if you use it.  However, there are some
  big problems with this:
  - The search code will be implemented in two places, using two
    different languages.  Keeping them synced will be very difficult,
    and not doing so will be very bad (when people discover that
    there's a different way to search, which produces different
    results).  BTW, the reason it will be difficult is that the search
    code is not at all trivial -- it does some scoring of the results,
    with different weights according to different factors, etc.
  - The interface to the search results will not be the same, since
    the scheme-based thing will just show a static page with results.
    I suppose that it's possible to make that static page have the
    same interface as the JS search so it appears the same, but load
    the index on demand (that is, when you change the query), but
    hacking something like that up is ... frightening.

* A variation on the above: implement the Scheme search, and get rid
  of the JS search.  This means no more dynamic searching, since it
  is effectively reverting back to the pre-JS days.

* A variation on this variation, which might be feasible: no JS
  search, but compensate by making the Scheme side search into some
  widget that will show the dynamically found pages.  Getting the
  Scheme-based browser is not a good option for the obvious reasons,
  but here we're talking about a classic GUI application.  The main
  problems that I see with this:
  - "Return of the Help Desk".  This is likely to have a sequel: "The
    Help Desk Strikes Back".  For example -- `plt-help' and MzScheme's
    `help' are textual things.  Possible solution is to have some
    textual interface.  Given the nature of the problem, the way I
    imagine such an interface is likely to be lame enough to cause
    people some 80s-style flashbacks.  (Making that Star Wars pun was
    more appropriate...)
  - To clarify -- I don't mean the whole help desk as it exited in
    pre-v4 days.  I'm just referring to the help desk as another
    component in the PLT suite.  An actual implementation will
    basically be just a search widget, perhaps with some way to mimic
    the toplevel set of pages.
  - No more searching on-line (which turned out to be more successful
    than I estimated) -- you want to find something in the docs, you
    need PLT installed, period.  Possible solution is to have a search
    thing running on the server -- something that slaps a modern
    html/ajax/whatever skin on top of that 80s TUI.

I'm happy to hear any other ideas people might have.  (Note that I'm
writing this sentence here, so it applies to people who read all of
that...)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.