[BusyBox] Re: [PATCH] Re: Sigh... this time i attached the file (bunzip-4.c)

Jörn Engel joern at wohnheim.fh-wedel.de
Fri Oct 24 09:45:09 UTC 2003


On Fri, 24 October 2003 00:28:28 -0500, Rob Landley wrote:
> On Thursday 23 October 2003 11:01, Jörn Engel wrote:
> > > When reading from a filehandle, we can easily figure out how much we
> > > overshot and lseek backwards a bit.  The only question is should this be
> > > in the gzip/bzip library code, or in the code that calls it?
> >
> > It should definitely be in the library, completely transparent to the
> > application, imo.
> >
> > Not exactly sure what bzip2 uses the overshoot for,
> 
> We read 4k at a time from input into a buffer, so get_bits can return an 
> arbitrary number of bits of input when asked.  If we needed to do a syscall 
> to get the nexty byte, it would slow the implementation down tremendously.
> 
> The problem with it being in the library is that lseek has no effect when 
> doing "cat file.tbz | progname".  You can't seek backwards on stdin...

Sounds like my guess was wrong, you didn't refer to the same
'overshoot', I thought you did. :)

Not sure then.  It might be an idea to change interface slightly,
letting the user provide a read function.  If the user is working from
RAM, from a file with more things appended to the .bz2 data or more
complicated stuff, it is the user's problem.  If it is just a simple
.bz2 file, one more function call shouldn't hurt performance much.

> > My (ugly) solution was to use char[2*900000], so my memory footprint
> > is 10n, instead of 9n.  I guess that the overshoot is just a little
> > smarter, leaving some corner cases in but saving most of the memory I
> > wasted.
> 
> It sounds like your footprint was 18n, actually...

Nope, I merely copied the plain text, not the suffix arrays.

Jörn

-- 
The cost of changing business rules is much more expensive for software
than for a secretaty.
-- unknown



More information about the busybox mailing list