[BusyBox] Re: [PATCH] Re: Sigh... this time i attached the file (bunzip-4.c)

Rob Landley rob at landley.net
Fri Oct 24 05:28:28 UTC 2003


On Thursday 23 October 2003 11:01, Jörn Engel wrote:
> > When reading from a filehandle, we can easily figure out how much we
> > overshot and lseek backwards a bit.  The only question is should this be
> > in the gzip/bzip library code, or in the code that calls it?
>
> It should definitely be in the library, completely transparent to the
> application, imo.
>
> Not exactly sure what bzip2 uses the overshoot for,

We read 4k at a time from input into a buffer, so get_bits can return an 
arbitrary number of bits of input when asked.  If we needed to do a syscall 
to get the nexty byte, it would slow the implementation down tremendously.

The problem with it being in the library is that lseek has no effect when 
doing "cat file.tbz | progname".  You can't seek backwards on stdin...

> but I have done a
> similar thing in my blocksort implementation and guess it's the same.
> The problem is that you often need the next or the previous character
> in the test.  If using a char[900000], you have to check all sorts of
> corner cases, which hurts developers brain and performance.

The decompress code is in cvs now, and should be more or less finished I'd 
guess...

> My (ugly) solution was to use char[2*900000], so my memory footprint
> is 10n, instead of 9n.  I guess that the overshoot is just a little
> smarter, leaving some corner cases in but saving most of the memory I
> wasted.

It sounds like your footprint was 18n, actually...

> If this is true, then it is just an optimization.  And no library
> should require it's users to go through pain for an optimization
> internal to the library.
>
> Ok, does anyone know for certain, what the overshoot is used for?

On the compress side?  Still working...

> Jörn

Rob



More information about the busybox mailing list