[BusyBox] builtin echo for ash

Sun Aug 7 20:27:22 UTC 2005

Shaun Jackman <sjackman at gmail.com> writes:
> On 8/5/05, Paul Fox <pgf at brightstareng.com> wrote:
>> i've implemented a builtin echo command in ash.  i didn't have to
>> do much -- i moved the guts of the echo applet into libbb, and
>> now call bb_echo() from both echo.c and ash.c.  so the code size
>> impact of having both the applet and the builtin is minimal.
>> 
>> it seems to work, and on our platform (400mhz MPC5200, with small
>> caches) is at least 60x faster.  (running a script of 4000 echo
>> statements takes under a second with the builtin, versus about
>> one minute otherwise.)
>
> I'd guess the primary reason for the performance difference here is
> the fork(2) call.

The following small program:

-----------------------------
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>

void chld_handler(int signo)
{
	(void)signo;
	while (waitpid(-1, NULL, WNOHANG) > 0);
}

int main(void)
{
	unsigned i;

	signal(SIGCHLD, chld_handler);

	i = 40000;
	do if (fork() == 0) {
		puts("don't speculate -- profile");
		_exit(0);
	} while (--i);

	return 0;
}
-----------------------------
(assuming BSD-semantics for signal)

takes 46.15s wallclock time, with 4.05s in user- and 42.1s in
systemspace, running on the 200Mhz ARM926-SEJ board sitting next to
me (stdout redirected to /dev/null). This means that each iteration of
the loop (including the necessary waits) 'costs' me about 0.0011s of
CPU time spent executing kernel code and 0.0001s CPU time spent
executing user code (Linux 2.4.31/ uclibc). OTOH, each of Paul's echos
appears to cost him something like 0.015s wallclock time on a board
with twice the clockrate of ours, meaning it executes about 13 times
slower than a single iteration of the loop above over here.

> The typical solution in *nix was to provide a shell built-in as Paul
> has done. Busybox is in the unique position of being able to provide
> another solution, namely optionally providing a bb_fork() via
> longjmp implementation. echo wouldn't need to be a special case;
> every applet would implicitly receive the same performance benefit
> by avoiding the fork(2) call.

What you describe here is de facto a cooperative multitasking scheme
in a globally shared address space. Actually, it isn't even that: It's
a cooperative task-switching scheme in a globally shared address
space. This is something you really do not want to have, except on
extremely limited hardware.