[BusyBox] My cleaned up bunzip.
Rob Landley
rob at landley.net
Wed Oct 15 01:57:37 UTC 2003
On Tuesday 14 October 2003 18:05, Glenn McGrath wrote:
> On Tue, 14 Oct 2003 15:00:54 -0500
>
> Rob Landley <rob at landley.net> wrote:
> > Okay, I need about one more day of polishing, but the attached
> > bunzip.c file works if you do this:
> >
> > Now I've got to write up some documentation on how bunzip works,
> > although I hope this source is readable. All 560 lines of it. :)
I heard back from Julian Seward. He doesn't have a problem with me using a
different license as long as the attribution paragraph I put at the top
stays. So I think I'll dual licensed it under the LGPL and OSL.
> In regard to the crc table, as far as i remember, gzip uses adler32
> which is a different algorithm than "crc32", its faster and almost as
> good.
jseward says that the CRC algorithm is "AUTODIN", which is the one used for
ethernet, which is basically a big endian version of the the same basic crc
algorithm in gzip. Alexander van Heukelum emailed me a three line
implementation from a CRC FAQ that matches, and I cleaned it up a bit. Cut
another 100 lines off the source, and then out of sheer vanity I removed some
blank lines around comments (and added a few more comments). New total: 453
lines, which compiles to a 6504 byte executable (when stripped).
> Also, im pretty sure there are both crc32 and crc32b algorithms, not sure
> what different about them.
big/little endian.
There's a half dozen different ways you can abuse crc32. You can initialize
to 0 or -1, you can invert all the bits at the end or not, and you can do it
big or little endian.
I sent you the first thing that worked. Didn't mean I was through my to-do
list. :)
> Excellent work, major improvemnt.
Thanks.
It's still a little over 3 seconds slower extracting the 2.6.0-test6 kernel
tarball (81.5 seconds vs 78 seconds for the old one on my laptop), but oh
well. I'm tired. Ship it.
(Half the reason I did this was so other people could bang on it without
divine intervention. Time for them to figure out how to optimize the thing.
Although I do suspect the memory usage could definitely be brought down from
4 megs. It's those top 3 bytes of dbuf, used for the burrows-wheeler sorting
vector. that's eating most of it. But I'm not awake enough to think about it
right now...)
Here's the code. This is more or less ready to go in. I'm going to stop
looking at it for a while. All you should have to do is chop off the main()
at the end and call it from bunzip2, and life should otherwise be good. :)
Tomorrow, if I feel industrious, I start on bzip compress. Email me if you
find anything really stupid in the code, I've been a bit sleep deprived the
past couple days. :)
> Glenn
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bunzip.c
Type: text/x-csrc
Size: 15916 bytes
Desc: not available
Url : http://lists.busybox.net/pipermail/busybox/attachments/20031014/a07a6aaa/attachment.c
More information about the busybox
mailing list