Proposition: use a hashtable instead of bsearch to locate applets

Rich Felker dalias at libc.org
Fri May 30 18:07:52 UTC 2014


On Fri, May 30, 2014 at 12:13:41PM +0100, Laurent Bercot wrote:
> 
> On 05/30/2014 10:16 AM, Bartosz Gołaszewski wrote:
> >I've checked the times just by looking up all the applets in a loop
> >and measuring the time using gettimeofday() - the results are: ~220
> >microseconds for bsearch and ~150 microseconds for hashtable on my
> >linux laptop. Is it significant? I think so. Is it noticeable?
> >Probably not. The idea came to me, when thinking about unifying the
> >hashtables used in busybox. I guess you're right and it isn't really
> >worth the size increase.
> 
>  I think it would definitely be a worthwhile improvement if all busybox
> is doing was looking up the applets in a loop. ;)
>  A more realistic test, however, would be to fork+exec a trivial applet
> (true, for instance) in a loop, and measure the times with bsearch vs.
> with the hash table. And I'm certain you'll find that the difference
> becomes quite insignificant.

The lowest time I've ever seen for exec (not even counting fork;
measured from immediately before the exec syscall to entry into main)
is over 200µs, and can easily exceed 1ms with dynamic linking. So even
if the program did nothing but applet lookup and exit, I think we'd be
looking at a performance increase of ~30% at best.

Realistically, as soon as you throw in some actual work and other
syscalls that actually do something, I suspect the difference to be
less than 5%.

If there is a desire to change the way applet lookup is done, I would
suggest trying to make it optimal in both size and performance. 150µs
is not impressive at all. A perfect hash constructed at build time for
the list of busybox applet-name strings should make it possible to do
the applet lookup in ~1µs with trivial code/table size (all constant
in .text).

Rich


More information about the busybox mailing list