[plt-dev] file/zip file sizes
On Mar 5, Dave Gurnell wrote:
> Hi all,
>
> I remember someone mentioning the size of ZIP files created with
> file/ zip a while back. I can't remember the context, though, and I
> can't find the post in my archives.
>
> Anyway, I just created a lot of ZIP files with file/zip, and
> unzipping and rezipping them on the command line saved a lot of
> space.
>
> Does anyone remember the resolution of the original ZIP-related
> post?
There was none. The zip functionality is implementd in
collects/mzlib/zip.ss, which uses collects/mzlib/deflate.ss to do the
compression work. That last module is a direct translation of the
gzip code (and as a result, it's not as fast as you could get with
more idiomatic Scheme code).
It does have a LEVEL definition which corresponds to the compression
level argument for gzip, and I tried to tweak that to see what
happens. To my surprise, I discovered a bug that I introduced a long
time ago -- which made it do almost no compression (or maybe none at
all). But the result still had valid format so it wasn't discovered.
I fixed that, and made some other minor improvements (mainly changed
more vectors to byte string -- the code is old enough that it precedes
byte strings). A quick summary of gzipping a large (10M) text file on
my machine:
Before the bug fix:
cpu time: 3879; size: 4938648 = 49.4%
After fixing it:
cpu time: 4497; size: 2226952 = 22.2%
After switching to use more byte strings:
cpu time: 4373
And when tweaking the LEVEL I get:
level = 1 -- cpu time: 3153; size: 2545842 = 25.5%
level = 9 -- cpu time: 9844; size: 2195097 = 22.0%
So it looks like there is no point in changing the level.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!