[BusyBox] Re: [PATCH] Re: Sigh... this time i attached the file (bunzip-4.c)
rob at landley.net
Sat Oct 25 02:40:52 UTC 2003
On Friday 24 October 2003 12:32, Jörn Engel wrote:
> > Actually, the n*log(n) shell sort you suggested earlier works well enough
> > for my purposes. Sorting one 900k block of data (which will be after run
> > length compression, so it's usually rather a lot more data than that)
> > takes a little over 13 seconds of user time (worst case, all zeroes) and
> > 8.15 seconds tested with random data. (This is on a 900 mhz celeron
> > laptop; not unreasonably far off from the times bzip itself is getting,
> > although I haven't torn apart and instrmented bzip's block sorting stuff
> > yet...)
> Mine takes .4 seconds for random data, much better. With all zeros,
> it takes forever, >10m or worse. This can be improved. :)
I realise I'm trading off best case for worst case. I'll happily stick a
function in front of it that can fall back to the shell sort if it hasn't
produced a solution in ~1 second...
> Can you send me your code? I am not (yet) allowed to send you mine in
> return, but I can sure comment on it, after fixing my speed problem.
Let me finish debugging it first. Hopefully later this evening...
> > The shell sort uses no more data than the 900k of charcters and the index
> > array. I also whipped up a couple of GET_IDX and SET_IDX functions that
> > only use 20 bits for each index array entry instead of 32 (bringing the
> > total block sorting memory usage down to 3125000 bytes), but all the
> > extra anding, oring, and bit-shifting bloated the worst-case sort time to
> > 22 seconds for a 900k block of zeroes...
> I've already thought about using 64k blocks for things like jffs2,
> swap compression etc. 16 bits are not much slower. You could try
> using 24 bits and see how much speed you gain/lose vs. 20/32 bits.
Well, I know how to do a faster implementation, but the problem is unaligned
offsets. The following is trivial (and off the top of my head, so it may not
actually work. :)
inline int get_idx(char *array, int idx)
return ((*(int *)(array+((idx*20)/8)))>>((idx*20)&7))&((1<<20)-1);
The problem is there are some brain-damaged processors out there that can't do
an unaligned access, and busybox expects to be ported to them. Working with
24 bits would have a similar problem, I'd still need to do bit shifts if I
can't do an unaligned access.
So instead I'm doing (cut and paste):
/* Grab a value from array, which uses 20 bits per value instead of 32 */
inline int get_idx(unsigned int *array, int idx)
unsigned int temp,pos,offset;
inline void put_idx(unsigned int *array, int idx, int val)
unsigned int mask,pos,offset;
#define GET_IDX(list,i) get_idx(list,i)
#define PUT_IDX(list,i,j) put_idx(list,i,j)
#define STORAGE_SIZE(i) (((20*i)+31)/32)
#define GET_IDX(list,i) list[i]
#define PUT_IDX(list,i,j) list[i]=j
#define STORAGE_SIZE(i) i
Which is kind of evil, but I'm sure Manuel could tighten it up tremendously if
he set his mind to it. Me, I just wanted a proof of concept...
For right now, the 32 bit array-based versions are good enough. My laptop's
got over 190 something megs of ram in it, I'm not going to begrudge bzip 5
megs while I'm debugging it. :)
> > My first pass at bzip compression may not work as fast as jseward's, but
> > it'll have a heck of a lot less code, and I can bolt the
> > special-case-optimized sorting functions onto it later, after I get it to
> > work...
> Actually, you are impressively fast already, sounds almost as if you
> were cheating. Without cheating, this is a great basis.
Well, I am cheating. I'm on a 900 mhz coppermine celeron now. I used to do
my development work on a 366 mhz pentium II. Now THAT was a machine that
made you care how fast things ran. But it got it wet, so I had to upgrade.
(You're not supposed to feed 'em after midnight, either. Dunno what happens,
maybe they self-install windows or something...)
More information about the busybox