[BusyBox] Re: [PATCH] Re: Sigh... this time i attached the file (bunzip-4.c)

Rob Landley rob at landley.net
Sat Oct 25 02:40:52 UTC 2003


On Friday 24 October 2003 12:32, Jörn Engel wrote:
> > Actually, the n*log(n) shell sort you suggested earlier works well enough
> > for my purposes.  Sorting one 900k block of data (which will be after run
> > length compression, so it's usually rather a lot more data than that)
> > takes a little over 13 seconds of user time (worst case, all zeroes) and
> > 8.15 seconds tested with random data.  (This is on a 900 mhz celeron
> > laptop; not unreasonably far off from the times bzip itself is getting,
> > although I haven't torn apart and instrmented bzip's block sorting stuff
> > yet...)
>
> Mine takes .4 seconds for random data, much better.  With all zeros,
> it takes forever, >10m or worse.  This can be improved. :)

I realise I'm trading off best case for worst case.  I'll happily stick a 
function in front of it that can fall back to the shell sort if it hasn't 
produced a solution in ~1 second...

> Can you send me your code?  I am not (yet) allowed to send you mine in
> return, but I can sure comment on it, after fixing my speed problem.

Let me finish debugging it first.  Hopefully later this evening...

> > The shell sort uses no more data than the 900k of charcters and the index
> > array.  I also whipped up a couple of GET_IDX and SET_IDX functions that
> > only use 20 bits for each index array entry instead of 32 (bringing the
> > total block sorting memory usage down to 3125000 bytes), but all the
> > extra anding, oring, and bit-shifting bloated the worst-case sort time to
> > 22 seconds for a 900k block of zeroes...
>
> I've already thought about using 64k blocks for things like jffs2,
> swap compression etc.  16 bits are not much slower.  You could try
> using 24 bits and see how much speed you gain/lose vs. 20/32 bits.

Well, I know how to do a faster implementation, but the problem is unaligned 
offsets.  The following is trivial (and off the top of my head, so it may not 
actually work. :)

inline int get_idx(char *array, int idx)
{
	return ((*(int *)(array+((idx*20)/8)))>>((idx*20)&7))&((1<<20)-1);
}

The problem is there are some brain-damaged processors out there that can't do 
an unaligned access, and busybox expects to be ported to them.  Working with 
24 bits would have a similar problem, I'd still need to do bit shifts if I 
can't do an unaligned access.

So instead I'm doing (cut and paste):

#ifdef LESS_MEMORY_BUT_SLOWER
/* Grab a value from array, which uses 20 bits per value instead of 32 */
inline int get_idx(unsigned int *array, int idx)
{
  unsigned int temp,pos,offset;

  pos=(idx*20)/32;
  offset=(idx*20)&31;
  temp=array[pos]>>offset;
  if(offset>12) temp|=array[pos+1]<<(32-offset);
  return temp&((1<<20)-1);
}

inline void put_idx(unsigned int *array, int idx, int val)
{
  unsigned int mask,pos,offset;

  pos=(idx*20)/32;
  offset=(idx*20)&31;
  mask=(1<<20)-1;
//  val&=mask;
  array[pos]=(array[pos]&~(mask<<offset))|(val<<offset);
  if(offset>12)
    array[pos+1]=(array[pos+1]&~(mask>>(32-offset)))|(val>>(32-offset));
}
#define GET_IDX(list,i) get_idx(list,i)
#define PUT_IDX(list,i,j) put_idx(list,i,j)
#define STORAGE_SIZE(i) (((20*i)+31)/32)
#else
#define GET_IDX(list,i) list[i]
#define PUT_IDX(list,i,j) list[i]=j
#define STORAGE_SIZE(i) i
#endif

Which is kind of evil, but I'm sure Manuel could tighten it up tremendously if 
he set his mind to it.  Me, I just wanted a proof of concept...

For right now, the 32 bit array-based versions are good enough.  My laptop's 
got over 190 something megs of ram in it, I'm not going to begrudge bzip 5 
megs while I'm debugging it. :)

> > My first pass at bzip compression may not work as fast as jseward's, but
> > it'll have a heck of a lot less code, and I can bolt the
> > special-case-optimized sorting functions onto it later, after I get it to
> > work...
>
> Actually, you are impressively fast already, sounds almost as if you
> were cheating.  Without cheating, this is a great basis.

Well, I am cheating.  I'm on a 900 mhz coppermine celeron now.  I used to do 
my development work on a 366 mhz pentium II.  Now THAT was a machine that 
made you care how fast things ran.  But it got it wet, so I had to upgrade.  
(You're not supposed to feed 'em after midnight, either.  Dunno what happens, 
maybe they self-install windows or something...)

> Jörn

Rob





More information about the busybox mailing list