[plt-scheme] Compression dictionary

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Oct 5 16:39:16 EDT 2009

On Oct  5, Jens Axel Søgaard wrote:
> Hmm. Tricky.
> 
> You said that LZW worked best on large text, but that you couldn't
> afford large strings. Is it an idea to build a dictionary on the
> server, by running the LZW compression algorithm on all the strings
> (thus having a large text), and then start the LZW decompression
> with a non-empty dictionary?

Perhaps being very concrete would help...  Here's a fragment from the
JS index (which is part of the code that does the help search):

  ["disposition-read","../net/mime.html#(def._((lib._net/mime..ss)._disposition-read))","<span class=\"ScmSym\"><span class=\"ScmValLink\">disposition-read</span></span>",["net/mime"]],
  ["disposition-size","../net/mime.html#(def._((lib._net/mime..ss)._disposition-size))","<span class=\"ScmSym\"><span class=\"ScmValLink\">disposition-size</span></span>",["net/mime"]],
  ["disposition-type","../net/mime.html#(def._((lib._net/mime..ss)._disposition-type))","<span class=\"ScmSym\"><span class=\"ScmValLink\">disposition-type</span></span>",["net/mime"]],

It has a lot of strings that are repeated -- and this translates to a
much bigger file as well as a much longer time for the browser's JS
engine to parse this source and run it.  I'm looking for a way to find
a good set of words that I can use at the time where this index is
created, so that the overall file is smaller.  Using something like
LZW on each string is not effective -- because LZW becomes effective
when you give it more text.  (This is essentially why tgz will often
outperform (older) zip files by compressing a whole directory as one
unit.)

When I have such a set of strings (phrases), I can do a simple
substitution of occurrences of these strings with pointers into an
array that holds the phrases.  So overall the array would be smaller
(much, hopefully), which would reduce the time you wait for the search
page to come up -- while keeping the decoding effort (which is done in
javascript) to a simple process of substituting such references with
words found in the phrases vector.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.