[plt-scheme] Compression dictionary

From: Jens Axel Søgaard (jensaxel at soegaard.net)
Date: Mon Oct 5 13:42:18 EDT 2009

2009/10/5 Eli Barzilay <eli at barzilay.org>:
> On Oct  5, Jens Axel Søgaard wrote:
>> 2009/10/5 Eli Barzilay <eli at barzilay.org>:
>> > Does anyone know of a good and simple method for building a dictionary
>> > for compression?
>>
>> > Explanation: The documentation index file was ~2mb initially, and now
>> > it's up to 3mb.  In addition, some thing I did for compression make
>> > loading it slower (like nested arrays which are used like Sexprs) so
>> > I'm revising the whole thing.
>>
>> > Example>
>>
>> > "foo_bar"
>> >  "meh_blah_foo_blah"
>>
>> I understand the tokens are "foo", "bar", "meh", and, "blah".
>
> Well, I'm working with just raw strings -- trying to get meaningful
> tokens is going down "regexp-hell"...  So in that example I had
> "_blah" as a token in one example, and "foo_" in the other.

Okay, so what the actual tokens used by the algorithm is not as important
as fast decoding is.

Is it possible to make a back-of-the-envelope calculation
with respect to compression rate, download time, and
decoding time?

Just to get a feeling of the sizes involved:

jasmacair:tmp jensaxelsoegaard$ ls -las index.html
7960 -rw-r--r--  1 jensaxelsoegaard  wheel  4071868  4 Okt 19:33 index.html
jasmacair:tmp jensaxelsoegaard$ gzip index.html
jasmacair:tmp jensaxelsoegaard$ ls -las index.html.gz
648 -rw-r--r--  1 jensaxelsoegaard  wheel  330511  4 Okt 19:33 index.html.gz

The original file size is 4071868 bytes and a gzipped version
is only 330511. The gzipped version is thus only 8% of the original.

Question: Does the PLT web server support on-the-fly gzip compression?

I suppose it does (I think, I saw a gzip-stuffer some where).

Is it used for docs.plt-scheme.org?

NB: The 8% might not be directly applicable, since the file contains a
lot of html.

-- 
Jens Axel Søgaard


Posted on the users mailing list.