[BusyBox] Re: [PATCH] Re: Sigh... this time i attached the file (bunzip-4.c)
Jörn Engel
joern at wohnheim.fh-wedel.de
Fri Oct 24 09:45:09 UTC 2003
On Fri, 24 October 2003 00:28:28 -0500, Rob Landley wrote:
> On Thursday 23 October 2003 11:01, Jörn Engel wrote:
> > > When reading from a filehandle, we can easily figure out how much we
> > > overshot and lseek backwards a bit. The only question is should this be
> > > in the gzip/bzip library code, or in the code that calls it?
> >
> > It should definitely be in the library, completely transparent to the
> > application, imo.
> >
> > Not exactly sure what bzip2 uses the overshoot for,
>
> We read 4k at a time from input into a buffer, so get_bits can return an
> arbitrary number of bits of input when asked. If we needed to do a syscall
> to get the nexty byte, it would slow the implementation down tremendously.
>
> The problem with it being in the library is that lseek has no effect when
> doing "cat file.tbz | progname". You can't seek backwards on stdin...
Sounds like my guess was wrong, you didn't refer to the same
'overshoot', I thought you did. :)
Not sure then. It might be an idea to change interface slightly,
letting the user provide a read function. If the user is working from
RAM, from a file with more things appended to the .bz2 data or more
complicated stuff, it is the user's problem. If it is just a simple
.bz2 file, one more function call shouldn't hurt performance much.
> > My (ugly) solution was to use char[2*900000], so my memory footprint
> > is 10n, instead of 9n. I guess that the overshoot is just a little
> > smarter, leaving some corner cases in but saving most of the memory I
> > wasted.
>
> It sounds like your footprint was 18n, actually...
Nope, I merely copied the plain text, not the suffix arrays.
Jörn
--
The cost of changing business rules is much more expensive for software
than for a secretaty.
-- unknown
More information about the busybox
mailing list