[BusyBox] Re: Where's the bzip2 compressed linux-kernel patch?

Jörn Engel joern at wohnheim.fh-wedel.de
Mon Oct 20 14:33:51 UTC 2003

On Mon, 20 October 2003 04:41:57 -0500, Rob Landley wrote:
> On Monday 20 October 2003 03:31, Jörn Engel wrote:
> > On Sat, 18 October 2003 15:38:35 -0500, Rob Landley wrote:
> >
> > Does anyone really need the "small mode"?  If not, I consider it a
> > feature to not support it. :)
> The "fast mode" walks a 4 megabyte array (okay, 900k*4=3.6 megabytes) in a way 
> that makes the cpu cache break down and cry, and actually brings dram bank 
> switching latency into the picture.  (It's about the worst case scenario for 
> memory access; fetching whole cache lines for randomly ordered 4 byte 
> reads...)  Anything that doesn't do that seems like it's at least worth a 
> look, but it's not a high priority just now...;)

I should take a closer look at the code sometime.  I've written a very
data-local implementation of the block sorting myself.  My problem was
to sort *really big* blocks, bigger than the L4-cache (aka main
memory).  Locality really helps when suffering from swap latency. :)

4MB should also not matter too much, as a memory requirement.  Either
you need compression anyway, or you are early on in the boot sequence
with lots of free memory.

> Any future uber-xeon that has a 4 megabyte L2 cache is going to do bunzip 
> _really_ fast. :)

Wouldn't running busybox on that machine be kinda pointless maybe? ;)

> It turns out Manuel Novoa is busy beating the source code to death with a 
> profiler, which is why I'm waiting on the bzip-kernel patch.  He's sped it up 
> another 10% or so since last time (extracting the kernel tarball took 22 
> seconds in the original, and he's got it down under 18 now), but hasn't 
> checked the result into busybox's CVS yet.  (He's still tweaking.)

Nice to see such work. :)

> No rush.  The kernel patch is on hold pending delivery of Manuel's updated 
> bunzip code.  The ball's in his court at the moment.

see below.

> > You don't - by any chance - plan the same thing for the compression
> > side, do you?  I still have vague plans to improve the algorithm a bit,
> > so a clean codebase would be nice to have.
> That's what I'm working on now.  Yesterday, I glued all the bits necessary to 
> bzip compress stdin to stdout into one 95k kilobyte c file (which compiled to 
> a ~50k executable).  So far this evening I've trimmed that down to 79k of 
> source and 48k of executable, and I'm still at the "crapectomy" stage.  Not 
> actually doing serious reorganization type cleanup yet, just taking a scalpel 
> (okay, vi) and removing bits that shouldn't BE there.  (The line "typedef 
> BZFILE void;" should not occur in any program anywhere ever.  It just 
> shouldn't.  "#define BZ_API(func) func" and "#define BZ_EXTERN extern" aren't 
> really a good sign either, although they were the result of making the code 
> compile on windows.  Unwinding this kind of thing is very time consuming.)

I've started doing the same thing, but got distracted at some point of
time.  Look here:

All the cleanup patches might look quite familiar to you. ;)

My killer application for this was and still is jffs2.  But tests
showed, that bzip2 actually compresses worse than gzip when running on
chunks of <=4k.  I've pretty much figured out the reason by now and am
just writing on a paper with my findings.  Hope that IBM allows me to
follow up on that and create a now compression suite that is actually
better than bzip2.  We'll see.

> Not really relevant to the kernel, though.  Ask on the busybox list if you 
> want to know more. :)

Will do.  CC'd

> Which patches are you referring to?


> I'm going to take the shepard patch and make a 2.6 version with the new bunzip 
> code as soon as Manuel gets tired of making it faster.  But he hasn't yet, 
> and I've got plenty of other things to do in the meantime...

Sure.  I've had the same plan, and already started, but found some
questions that are still unanswered:
- With zlib, bzlib and possibly more, is lib/* still a good place?
- How can we reduce the memory consumption of say 1000 open ipsec
  connections, eating up a zlib workspace each?
- How much code can be shared amoung different compression algorithms?
- There are users that will compress block-based, others need
  streaming support.  Can these two classes share code?  How?

And so on.  This is definitely 2.7 work, but it doesn't hurt to start
somewhere and fix problems on the way.  Feel free to port your
reimplementation to the kernel, less work for me. :)


Everything should be made as simple as possible, but not simpler.
-- Albert Einstein

More information about the busybox mailing list