[BusyBox] My cleaned up bunzip.

Rob Landley rob at landley.net
Wed Oct 15 01:57:37 UTC 2003


On Tuesday 14 October 2003 18:05, Glenn McGrath wrote:
> On Tue, 14 Oct 2003 15:00:54 -0500
>
> Rob Landley <rob at landley.net> wrote:
> > Okay, I need about one more day of polishing, but the attached
> > bunzip.c file works if you do this:
> >
> > Now I've got to write up some documentation on how bunzip works,
> > although I hope this source is readable.  All 560 lines of it. :)

I heard back from Julian Seward.  He doesn't have a problem with me using a 
different license as long as the attribution paragraph I put at the top 
stays.  So I think I'll dual licensed it under the LGPL and OSL.

> In regard to the crc table, as far as i remember, gzip uses adler32
> which is a different algorithm than "crc32", its faster and almost as
> good.

jseward says that the CRC algorithm is "AUTODIN", which is the one used for 
ethernet, which is basically a big endian version of the the same basic crc 
algorithm in gzip.  Alexander van Heukelum emailed me a three line 
implementation from a CRC FAQ that matches, and I cleaned it up a bit.  Cut 
another 100 lines off the source, and then out of sheer vanity I removed some 
blank lines around comments (and added a few more comments).  New total: 453 
lines, which compiles to a 6504 byte executable (when stripped).

> Also, im pretty sure there are both crc32 and crc32b algorithms, not sure
> what different about them.

big/little endian.

There's a half dozen different ways you can abuse crc32.  You can initialize 
to 0 or -1, you can invert all the bits at the end or not, and you can do it 
big or little endian.

I sent you the first thing that worked.  Didn't mean I was through my to-do 
list. :)

> Excellent work, major improvemnt.

Thanks.

It's still a little over 3 seconds slower extracting the 2.6.0-test6 kernel 
tarball (81.5 seconds vs 78 seconds for the old one on my laptop), but oh 
well.  I'm tired.  Ship it.

(Half the reason I did this was so other people could bang on it without 
divine intervention.  Time for them to figure out how to optimize the thing.  
Although I do suspect the memory usage could definitely be brought down from 
4 megs.  It's those top 3 bytes of dbuf, used for the burrows-wheeler sorting 
vector. that's eating most of it.  But I'm not awake enough to think about it 
right now...)

Here's the code.  This is more or less ready to go in.  I'm going to stop 
looking at it for a while.  All you should have to do is chop off the main() 
at the end and call it from bunzip2, and life should otherwise be good. :)

Tomorrow, if I feel industrious, I start on bzip compress.  Email me if you 
find anything really stupid in the code, I've been a bit sleep deprived the 
past couple days. :)

> Glenn

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bunzip.c
Type: text/x-csrc
Size: 15916 bytes
Desc: not available
Url : http://lists.busybox.net/pipermail/busybox/attachments/20031014/a07a6aaa/attachment.c 


More information about the busybox mailing list