[BusyBox] Re: [PATCH] Re: Sigh... this time i attached the file (bunzip-4.c)

Jörn Engel joern at wohnheim.fh-wedel.de
Thu Oct 23 16:01:01 UTC 2003


On Thu, 23 October 2003 00:34:06 -0500, Rob Landley wrote:
> On Wednesday 22 October 2003 17:12, Glenn McGrath wrote:
> > On Wed, 22 Oct 2003 09:57:35 -0600
> > mjn3 at codepoet.org (Manuel Novoa III) wrote:
> > > Of course, the analogy with read() does break down in another respect.
> > > Because read_bunzip() can read and buffer more than it needs, you
> > > might have unused data remaining in the bunzip buffer if there is data
> > > trailing the compressed block.  That is definitely an issue if you
> > > want to treat this as a "black box".
> >
> > This is an issue i have run into with the dpkg and dpkg-deb applets, in
> > .deb's there are two .tar.gz inside an ar archive, if you read too much
> > much for the first tar.gz youve read past next ar header, and possibly
> > into the next .tar.gz.
> >
> > (busybox supports .tar.bz2 inside deb's buts its non-standard)
> >
> > hmm, i need to do some testing to verify its current behaviour.
> 
> When reading from a filehandle, we can easily figure out how much we overshot 
> and lseek backwards a bit.  The only question is should this be in the 
> gzip/bzip library code, or in the code that calls it?

It should definitely be in the library, completely transparent to the
application, imo.

Not exactly sure what bzip2 uses the overshoot for, but I have done a
similar thing in my blocksort implementation and guess it's the same.
The problem is that you often need the next or the previous character
in the test.  If using a char[900000], you have to check all sorts of
corner cases, which hurts developers brain and performance.

My (ugly) solution was to use char[2*900000], so my memory footprint
is 10n, instead of 9n.  I guess that the overshoot is just a little
smarter, leaving some corner cases in but saving most of the memory I
wasted.

If this is true, then it is just an optimization.  And no library
should require it's users to go through pain for an optimization
internal to the library.

Ok, does anyone know for certain, what the overshoot is used for?

Jörn

-- 
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu



More information about the busybox mailing list