[plt-dev] file/zip file sizes

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Mar 5 17:42:56 EST 2009

On Mar  5, Dave Gurnell wrote:
> Hi all,
> 
> I remember someone mentioning the size of ZIP files created with
> file/ zip a while back. I can't remember the context, though, and I
> can't find the post in my archives.
> 
> Anyway, I just created a lot of ZIP files with file/zip, and
> unzipping and rezipping them on the command line saved a lot of
> space.
> 
> Does anyone remember the resolution of the original ZIP-related
> post?

There was none.  The zip functionality is implementd in
collects/mzlib/zip.ss, which uses collects/mzlib/deflate.ss to do the
compression work.  That last module is a direct translation of the
gzip code (and as a result, it's not as fast as you could get with
more idiomatic Scheme code).

It does have a LEVEL definition which corresponds to the compression
level argument for gzip, and I tried to tweak that to see what
happens.  To my surprise, I discovered a bug that I introduced a long
time ago -- which made it do almost no compression (or maybe none at
all).  But the result still had valid format so it wasn't discovered.

I fixed that, and made some other minor improvements (mainly changed
more vectors to byte string -- the code is old enough that it precedes
byte strings).  A quick summary of gzipping a large (10M) text file on
my machine:

Before the bug fix:
  cpu time: 3879; size: 4938648 = 49.4%

After fixing it:
  cpu time: 4497; size: 2226952 = 22.2%

After switching to use more byte strings:
  cpu time: 4373

And when tweaking the LEVEL I get:
  level = 1 -- cpu time: 3153; size: 2545842 = 25.5%
  level = 9 -- cpu time: 9844; size: 2195097 = 22.0%

So it looks like there is no point in changing the level.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the dev mailing list.