[BusyBox] Re: Where's the bzip2 compressed linux-kernel patch?

Jörn Engel joern at wohnheim.fh-wedel.de
Wed Dec 17 13:46:00 UTC 2003


On Wed, 22 October 2003 06:13:50 -0500, Rob Landley wrote:
> On Tuesday 21 October 2003 08:15, Jörn Engel wrote:
> 
> > 3.1. "aaaaaaaaaaaaaaaaaaaa..." - all characters identical - simple to
> >      catch, not a real problem.
> > 3.2. "aaaaaaaaaaaaaaaaaaaa...aaab" - a single character different.
> >      This is much harder to catch and optimize.  It is also a very
> >      real problem, as many binaries have large ranges of 0x00, so this
> >      needs to be fast.
> > 3.3. "abcdefghijklmnopq...abcdefghijklmnopq..." - harmless string
> >      concatenated to itself.  Also causes O(n^2) behaviour, but needs
> >      a completely different solution.
> > 3.4. "a" -> "aba" -> "abacaba" -> "abacabadabacaba" - nasty
> >      combinations of 3.2 and 3.3.  Special treatment for either one
> >      causes you to catch the other worst case here.  Hard.
> 
> Looking at the block sorting machinery, I've largely been trying to see if I 
> could throw any of it away without losing too much speed, or at least 
> refactor it to make the fallback order a bit more readily apparent.  I don't 
> expect to be able to fundamentally improve on the sorting theory end of 
> things without spending several months in the library, which I haven't got 
> time for just now.

You might be able to improve things very quickly.  Someone found this
nice piece of code:
http://www.cs.dartmouth.edu/~doug/sarray/sarray.tar

Featuring, among other things, sorting the suffix array in O(n*log(n))
for *any* data.  Haven't looked at the code closely yet, but I guess
it uses the Larsson-Sadakane Algorithm.

Alternatively, you could look at this paper:
http://citeseer.nj.nec.com/567924.html

Worst case performance should be about O(n*log(n)*n^x).  You can tune
x down below .5, maybe close to .1 or so.  Big feature is memory
consumption just slightly above 5n.

Jörn

-- 
There's nothing better for promoting creativity in a medium than
making an audience feel "Hmm ­ I could do better than that!"
-- Douglas Adams in a slashdot interview



More information about the busybox mailing list