[plt-scheme] Compression dictionary

Mon Oct 5 17:07:11 EDT 2009

One thought...

The JS can make an AJAX request for the search file and the web server
can serve it gzipped, because most browsers support on the fly
decompression. Then that file would be received by JS in plain-text
and it could parse/eval it.

Jay

On Mon, Oct 5, 2009 at 2:39 PM, Eli Barzilay <eli at barzilay.org> wrote:
> On Oct  5, Jens Axel Søgaard wrote:
>> Hmm. Tricky.
>>
>> You said that LZW worked best on large text, but that you couldn't
>> afford large strings. Is it an idea to build a dictionary on the
>> server, by running the LZW compression algorithm on all the strings
>> (thus having a large text), and then start the LZW decompression
>> with a non-empty dictionary?
>
> Perhaps being very concrete would help...  Here's a fragment from the
> JS index (which is part of the code that does the help search):
>
>  ["disposition-read","../net/mime.html#(def._((lib._net/mime..ss)._disposition-read))","<span class=\"ScmSym\"><span class=\"ScmValLink\">disposition-read</span></span>",["net/mime"]],
>  ["disposition-size","../net/mime.html#(def._((lib._net/mime..ss)._disposition-size))","<span class=\"ScmSym\"><span class=\"ScmValLink\">disposition-size</span></span>",["net/mime"]],
>  ["disposition-type","../net/mime.html#(def._((lib._net/mime..ss)._disposition-type))","<span class=\"ScmSym\"><span class=\"ScmValLink\">disposition-type</span></span>",["net/mime"]],
>
> It has a lot of strings that are repeated -- and this translates to a
> much bigger file as well as a much longer time for the browser's JS
> engine to parse this source and run it.  I'm looking for a way to find
> a good set of words that I can use at the time where this index is
> created, so that the overall file is smaller.  Using something like
> LZW on each string is not effective -- because LZW becomes effective
> when you give it more text.  (This is essentially why tgz will often
> outperform (older) zip files by compressing a whole directory as one
> unit.)
>
> When I have such a set of strings (phrases), I can do a simple
> substitution of occurrences of these strings with pointers into an
> array that holds the phrases.  So overall the array would be smaller
> (much, hopefully), which would reduce the time you wait for the search
> page to come up -- while keeping the decoding effort (which is done in
> javascript) to a simple process of substituting such references with
> words found in the phrases vector.
>
> --
>          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                    http://barzilay.org/                   Maze is Life!
> _________________________________________________
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>

-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://teammccarthy.org/jay

"The glory of God is Intelligence" - D&C 93