[plt-scheme] Compression dictionary
2009/10/5 Eli Barzilay <eli at barzilay.org>:
> On Oct 5, Jens Axel Søgaard wrote:
>> 2009/10/5 Eli Barzilay <eli at barzilay.org>:
>> > Does anyone know of a good and simple method for building a dictionary
>> > for compression?
>>
>> > Explanation: The documentation index file was ~2mb initially, and now
>> > it's up to 3mb. In addition, some thing I did for compression make
>> > loading it slower (like nested arrays which are used like Sexprs) so
>> > I'm revising the whole thing.
>>
>> > Example>
>>
>> > "foo_bar"
>> > "meh_blah_foo_blah"
>>
>> I understand the tokens are "foo", "bar", "meh", and, "blah".
>
> Well, I'm working with just raw strings -- trying to get meaningful
> tokens is going down "regexp-hell"... So in that example I had
> "_blah" as a token in one example, and "foo_" in the other.
Okay, so what the actual tokens used by the algorithm is not as important
as fast decoding is.
Is it possible to make a back-of-the-envelope calculation
with respect to compression rate, download time, and
decoding time?
Just to get a feeling of the sizes involved:
jasmacair:tmp jensaxelsoegaard$ ls -las index.html
7960 -rw-r--r-- 1 jensaxelsoegaard wheel 4071868 4 Okt 19:33 index.html
jasmacair:tmp jensaxelsoegaard$ gzip index.html
jasmacair:tmp jensaxelsoegaard$ ls -las index.html.gz
648 -rw-r--r-- 1 jensaxelsoegaard wheel 330511 4 Okt 19:33 index.html.gz
The original file size is 4071868 bytes and a gzipped version
is only 330511. The gzipped version is thus only 8% of the original.
Question: Does the PLT web server support on-the-fly gzip compression?
I suppose it does (I think, I saw a gzip-stuffer some where).
Is it used for docs.plt-scheme.org?
NB: The 8% might not be directly applicable, since the file contains a
lot of html.
--
Jens Axel Søgaard