[BusyBox] Re: Where's the bzip2 compressed linux-kernel patch?
rob at landley.net
Wed Oct 22 22:01:39 UTC 2003
On Wednesday 22 October 2003 07:48, Jörn Engel wrote:
> On Wed, 22 October 2003 06:13:50 -0500, Rob Landley wrote:
> > On Tuesday 21 October 2003 08:15, Jörn Engel wrote:
> > The sort should more or less be pluggable, so what I may do is just
> > abstract it out into a sort() method, feed that to the library
> > implementation of quicksort for now, and come back after I've got the
> > rest of it converted to C and see what speedups are worth the increase in
> > executable size.
> Makes sense. Comlicated stuff with a simple interface - except that
> Julian passed all sorts of unnecessary mess to the sort functions as
I'm noticing that, yes. :)
> > Looking at the block sorting machinery, I've largely been trying to see
> > if I could throw any of it away without losing too much speed, or at
> > least refactor it to make the fallback order a bit more readily apparent.
> > I don't expect to be able to fundamentally improve on the sorting theory
> > end of things without spending several months in the library, which I
> > haven't got time for just now. However, the decompress code was so
> > complicated and crufty that it hid a good dozen major thinkos, which
> > allowed some serious simplification after it was all untangled. There
> > might be similar opportunities on the compression side without actually
> > changing the theory on which it operates.
> Very likely. The code still contains debugging output, but hasn't
> been changed for quite a while, as it reached bugfix-only mode.
> Something doesn't match, imo. :)
I haven't had any Copious Free TIme so far today, but I expect to remain in
"throwing stuff out" mode at least through friday, before getting around to
serious "reorganizing stuff".
There's 8 gazillion asserts, which should probably be comments. There are an
awful lot of tests that translate to "I don't trust the C compiler". This
has all the hallmarks of a theoretical mathematician trying to defend himself
from the evils of an implementation ever having to run on a real machine that
could have a bit flipped by cosmic rays... :)
> > One question I'm still rooting around for an answer to is if the first
> > batch of run length compression (the one undone after the burrows-wheeler
> > detransform during decompress) is done before the sorting in order to
> > mitigate 3.1 and 3.2 from your list. I think so, but I haven't found it
> > in the compress code yet. (I will, I was just busy on the decompression
> > side this evening, and have had about my fill of bzip for one day
> > already. I'll pick it up again tomorrow.)
> I owe you one! For some reason, I never noticed the run length
> encoding and kept wondering how Julian's implementation could be so
> quick. All the papers on efficient block sorting claimed things like
> O(n*ln(n)) for *any* data, were complicated beyond any sense and I
> couldn't figure out, how this could even be *possible*.
Glad I could help. :)
> > I'm all for it, but if code isn't available I'm not going to pay any
> > attention to it.
> This will take time, more than a year, likely. The design is still
> evolving, and I have to figure out if I can squeeze some academic
> title out of this. As long as people get them for obvious bullshit
> like the above, why not. :)
I look forward to seeing it then, but the bzip conversion should be done LONG
before then. (By the end of the month, I hope. Don't hold me to that.)
> It is not lossless anymore, though. I can imagine that they simply
> pre-generate an LZ77 or LZ78 dictionary with common strings. The LZ
> algorithms start compressing the second identical string, with a
> pre-defined dictionary, you can already compress the first.
> Drawback is a larger compressor/decompressor executable, but that is
> just one, compared to many smaller payload files.
I thought you could use the open decompressor with the closed compressor.
(And has anybody tested that it IS lossless? I thought all they tested was
that it runs. Elf files often have a bit of cruft in them that doesn't get
loaded into memory. Since the upx decompressor thing basically takes the
place of the executable loader, there's room for fudging it, isn't there?)
Of course since it IS closed, I have no idea what it's really doing. This is
just a guess based on knowledge of a thousand ways to pull a fast one while
claiming to be brilliant. (I have a black belt in cheap plastic solutions.
> But that only works for the inflate side. For compression, you need
> 7.2, plus original buffer, plus some overhead. More like 9M,
True. (Actually, I'll take your word for it. I'm still a few days away from
having any idea how much memory this stuff actually needs. Real Life hath
> > That said, you're not getting it below 1 megabyte per stream no matter
> > what you do, because the intermediate buffer has to store 900k before the
> > burrows-wheeler transform can be undone, and you CAN'T decompress until
> > you have a complete block.
> Right. You surely don't want too many of those running in parallel.
Depends. You could overlap emptying the intermediate buffer (dbuf) with
filling up the next intermediate buffer. (The first is streaming, the second
is random.) And if your SMP system has 4 megs of local L2 cache per
processor, you can parallel the blocks to your hearts' content...
This is for a monster server, though. A linksys box wouldn't want to waste
the electricity and heat budget even for hyper-threading...
> > This is not designed to be a streaming protocol. (The latency on output
> > is simarly horked: you try flushing the output after 4k of data, and your
> > compression ratio is going to be hamstrung.) You're talking about an
> > application for this that does not interest me.
> Also agreed. Sometimes it shows that 6 month ago, I was still
> completely unaware about compression. On the other hand, the reasons
> to put multiple compression algorithms into the kernel are growing
> with time. Swap compression will make sense someday, as one example.
I think it already does. But gzip, not bzip,and you'd probably want to group
pages together into about 64k chunks. (I forget how often the dictionary
gets flushed normally anyway. My last run through the deflate code was
around 1997. It's on my to-do list...)
> Or UPX, to name a (hopefully) lossless algorithm. Compression always
> makes assumptions about the data, but some assumptions are more
> general than others. Gzip and bzip2 are the most generalized in
> actual use, afaik. I'll try to add a third, but I might just as well
> fail. Your (including Manuel) work still helps me, working with bzip
> is a pain, and gzip is even much worse.
> Speaking of it, have you done something similar for gzip already?
I ported the compression side to java 1.0 in 1997. A cleanup of gzip code in
busybox is on my to-do list after I get bzip compression done, but it'll be a
More information about the busybox