[patch] Not quite bbsh. :)

Sat Sep 2 19:30:31 UTC 2006

On Saturday 02 September 2006 12:36 pm, Bernhard Fischer wrote:
> On Fri, Sep 01, 2006 at 05:20:45PM -0400, Rob Landley wrote:
> >So here's a little toy I made; the world's most pathetic shell.  It's an 
> >experiment to see how small I can get a shell down to, and it does indeed 
> >beat lash in both size and lameness.
> 
> >+static int handle(char *command)
> >+{
> [snip]
> >+	if (!argc) return 0;
> >+	if (argc==2 && !strcmp(argv[0],"cd")) chdir(argv[1]);
> 
> How should we deal with "builtin" commands?

Still working that out. :)

I'm leaning towards the lash method.  Here I was just implementing the 
absolute essentials (cd and exit) quick-and-dirty, and inline if statements 
were the fastest way to do that.  Some things you can't unload on a child 
process because the point of them is to mess with your current process's 
state.  A lot of stuff like builtin echo is just a performance optimization 
which can be handled via nofork versions of standard busybox applets.  
But "cd" doesn't make sense as a child process, you _can't_ fork and make 
that work.

> Should we possibly (optionally) provide "cd" et al as applets?

I was thinking about that yesterday.  There are two downsides:

First, I want scripts/individual to be able to build a usable version of this 
thing.  Relying on the applet infrastructure for builtins makes this harder.

Secondly, we'd have to teach the main applet mechanism not just about NOFORK, 
but also that there are some applets you don't want to make symlinks for.  
Things like "cd", "read", "exit", "exec", "export", and "source" (also known 
as ".") make no sense whatosever as separate processes, because what they do 
goes away when the new process does.  (Things like "echo" and "test" can be 
run with nofork, but also make sense as a standalone applet.)

Lash didn't try to integrate its builtins into the main applet infrastructure, 
instead it just made its own builtin function table.  The question is: is it 
smaller (and simpler) to teach busybox's applet.c about stuff that you 
don't --install (and get it out of busybox.links too, and figure out whether 
it should be listed in BusyBox.html as a separate entry or grouped under 
bbsh)...  Or is it easier to just make another array of function pointers 
with attached help text?  (Keep in mind I want to teach run_applet_by_name() 
about NOFORK anyway.  If we've got a perfectly usable echo or test, I don't 
want to have to re-implement it for speed purposes.  Note that NOFORK implies 
ENABLE_FEATURE_CLEAN_UP actually _happens_ in that applet.  Wonder why I've 
cared about that?  We might want to figure out how to enable it on a 
per-applet basis, or perhaps ENABLE_FEATURE_CLEAN_UP || ENABLE_BBSH)

I'd probably be more interested in trying to make inline if() statements scale 
if it wasn't for the help thing.  Even lash has a "help" entry that lists 
available commands.  It should list all the builtin busybox applets, plus any 
builtin local applets (like cd and export).  On the other hand, if we want 
the help entry for bbsh to list its built-ins anyway, possibly recycling 
the --help mechanism for that makes sense?

You wonder why it's taking me so long to design all this?

> Theoretical nitpicks about the codeline above:
> 1) The strcmp call is much larger than comparing these by hand

There's a certain amount of "quick and dirty" here.  I wasn't trying to find 
the most optimized way of doing this, I was trying to figure out what the 
least amount of work it needed to do was.

Also, the builtins will probably get moved out to something more like the lash 
builtin_xxx function table, I haven't quite decided yet.  (Depends on what 
kind of impact environment variable handling requires.  That part's still 
wet, as it were.)

It's easier to configure down a larger table of builtins to have just a couple 
of entries than it is to have two different code paths depending on whether 
we're being big or small.  If I got to the table, then everything goes to the 
table.  (Can the if() else if() else tree be made to scale?  At what point 
does it get bigger than the table version?  How big is the extra call to 
strcmp() compared to the overhead of a function call?  Can I get away with 
making the current argc/argv pair and the return value globals so these 
functions can be void blah(void) and avoid any stack setup at all?  If I did, 
would it be a good idea from a maintenance perspective?)

> 2) so if i invoke cdplayer /my/music_repo/*.au  then you do what
> exactly? ;)

Currently?  Feed cdplayer a filename with an * in it.  I mentioned that it 
doesn't handle globbing yet.

Right now, this clump:

        // Grab next word.  (Add dequote and envvar logic here)
        end=start;
        while(*end && !isspace(*end)) end++;
        argv[argc++]=xstrndup(start,end-start);
        start=end;

Is designed to be ripped out and replaced with something more like:

	start += grab_next_word(start, &argv);

And grab_next_word() could easily add more than one entry to argv.  That 
function is where wildcards go (I.E. globbing), and environment variable 
handling, and quote handling, and backslash escapes, and lots of other stuff.

The tricky bit is: that's not enough.  Pipe handling means that the loop 
coverting command into argv[] has to be _within_ a loop, because the result 
isn't just one argv[] but array of them (one per pipe segment) plus 
information about how they feed into each other (redirects).  I'm currently 
trying to put together a nice clean way to get that to #ifdef out so it 
shrinks down to something reasonably close to this small version when you 
don't need it.  That's probably the second code drop. :)

And THAT is the hard part of all this.  Not implementing the functionality, 
but making it cleanly separated enough that fundamental things like pipes and 
redirects can be a compile-time option without resulting it horrible 
duplication separated by unreadable #ifdefs.

I'm working on it.  This would go a lot faster if work didn't want me working 
on other things as my top priority right now.  (Whee!  Three day weekend!)

Right now, I'm trying to figure out what's necessary to teach it pipes and 
redirects.  And _that_ means I have to do more intelligent tokenizing to 
catch "thingy;thingy" with no spaces, but I still want quoting, environment 
variables, and pipelines to be separately selectable config options...

P.S.  Yes, I'm looking at Yann's quoting code in modprobe and trying to figure 
out if I can stick something shared in libbb that both can use... :)

> >+	else if(!strcmp(argv[0],"exit")) exit(argc>1 ? atoi(argv[1]) : 0); 
> >+	else {
> >+		int status;
> >+		pid_t pid=fork();
> >+		if(!pid) {
> >+			run_applet_by_name(argv[0],argc,argv);
> >+			execvp(argv[0],argv);
> >+			printf("No %s",argv[0]);
> >+			exit(1);
> >+		} else waitpid(pid, &status, 0);
> Another theoretical nitpick (can't resist, sorry :)

Go for it.  I opened myself up to this when I posted a code sample. :)

> status is superfluous, better make that read waitpid(pid,NULL,0); and
> ditch it alltogether, at least if no jobcontrol is selected or something
> like this.

Status is currently superfluous, yes.  (That's another of the ragged edges 
where "code plugs in here".  I filed down most of them for this example, but 
missed that one...)

Ok, what cares about exit status (checks notes):
  Job control.
  Environment variable support ($?).
  Flow control (if, while, etc).
  Pipes & Redirects (which covers && || & ; and friends)...

Can I have each of those without the others?  The first three don't depend on 
each other, but I think job control and maybe flow control probably depend on 
pipes and redirects.  Hmmm...

Job control needs to be able to background processes with & to make much 
sense.  Well, I suppose you could use ctrl-z and bg, but that's stretching it 
(and that requires terminal control.  So it's going to depend on 
_something_.)  Plus job control needs the infrastructure for tying together 
processes into a pipeline in order to extend that into manipulating groups of 
jobs with fg and bg and such.

I suppose I can do if statements without &&, so flow control doesn't quite 
have to depend on pipes and redirects.  At the same time, I need _some_ kind 
of structure representing a process in order to save the exit code and use it 
later.  (I have lots of notes on this, but assembling them into dependency 
trees is tricky.)

> [snip]
> Looks pretty small indeed :)

Yup.  That was the point. :)

The actual minimal configuration of bbsh will probably be slightly larger than 
this, just because some things might wind up being a huge pain to configure 
out (like tracking exit status or using a builtin function table).  But this 
is a frame of reference so I know what "small" looks like, and can track 
bloat relative to there.

Rob
-- 
Never bet against the cheap plastic solution.