Bugs in defconfig /bin/sh

Rob Landley rob at landley.net
Sun Oct 3 00:35:50 UTC 2010


On Thursday 30 September 2010 17:15:44 Denys Vlasenko wrote:
> On Thursday 30 September 2010 23:29, Rob Landley wrote:
> > > > (Making {file,file} curly bracket support work would be darn nice
> > > > too, that's the biggest thing I miss.  You can't even follow the
> > > > Linux From Scratch build instructions without that...)
> > >
> > > This is not easy. Consider three commands:
> > >
> > > v='/bin/*'; echo $v
> > > v='/bin/{a,b}*'; echo $v
> > > echo /bin/{a,b}*
> > >
> > > In the first case, unquoted $v is globbed *after* $v value is
> > > substituted, and echo prints long list of filenames in /bin.
> > >
> > > In the second case, echo prints just "/bin/{a,b}*"
> > >
> > > In the third case, echo prints all filenames in /bin which start from a
> > > and b.
> >
> > That's messed up, and it really sounds like the second case is a bug.
> >
> > I just fired up an Aboriginal system image with bash 2.05b (the much less
> > bloated version), and under that the third case _also_ prints
> > "/bin/{a,b}*", so the behavior of this corner case changed between bash
> > versions.  Most likely somebody reported the third case as a bug and they
> > fixed the one defect report without fixing the general case, because
> > they're the FSF and everything is a special case to them.
> >
> > The _important_ test is just:
> >
> >   v='/bin/*'; echo $v
> >
> > Which answers the simple question of glob precedence: does it happen
> > before or after the variable substitution?  And if the answer is "after"
> > (which it is), then that's what we should do.  Perform globbing after
> > substituting variables. {a,b} is part of globbing the same way * and ?
> > and [x-y] and such are.
> >
> > The fact that bash is a buggy pile of crap that can't even consistently
> > implement its own extensions is not our problem.  We should do the simple
> > thing and wait for the FSF to catch up.  (Specifically, wait for somebody
> > replying on this _obvious_bug_ to complain, and then tell them to
> > complain to the bash guys and make sure they don't _fix_ their
> > inconsistency first.)
>
> We can't do this.

In this context, does "can't" mean that the laws of physics don't allow it, 
it's illegal in this jurisdiction, that it is beyond our capabilities, or that 
you don't consider it a viable course of action?

> Regardless of what is correct or not (and in this case,
> it is debatable - it can easily be argued that brace expansion
> and glob expansion are two different things),

Wildcards can be inside the curly brackets as easily as outside:

  blah/{o?e,two.*}/*.txt

So either resolution happens at the same time, or you make multiple resolution 
passes in which case keeping track of what was quoted and what wasn't (and 
what was already expanded and what wasn't) is INSANE.

Beyond that, there's no purpose to hte behavioral differences.  They were 
clearly incidental, rather than designed.  Making these two cases differ in 
behavior:
  v='/bin/{a,b}*'; echo $v
  echo /bin/{a,b}*

When these two _don't_ differ in behavior:
  v='/bin/*'; echo $v
  echo /bin/*

It serves no purpose.  It's too subtle for users to keep track of and try to 
intentionally use in their own programs, it can only come up 
_unintentionally_.  It is a clear bug, and the fact the behavior differs from 
bash version to bash version (to act _more_ like the simple wildcard case) is 
evidence in favor of that position.

> if we want to have more users,

More than we have right now, when we don't support {} at _all_ even in the 
trivial completely unquoted cases with no wildcards present?

If we want more users, then we need to run their scripts.  Don't worry about 
what bash does or doesn't do in made-up examples, show me scripts people are 
currently using which don't run under hush.  And the only way to do that is 
approximate bash's behavior closely enough that people _try_ to run those 
scripts, meaning adding trivial {} support and then seeing where it's 
insufficient and needs to be expanded.

> we should be compatible with bash (latest one, if older ones
> are different).

You're suggesting a need to be bug-for-bug compatible with something that 
changes its behavior every version.  That's not a viable course of action.

The fact that the latest ones differ in behavior from the earlier ones shows 
that there was no actual _intent_ behind this behavior.  And the fact that 
Ubuntu managed to shove the Defective Annoying SHell down people's throats 
kind of makes this entire nitpicking argument seem ludicruous in context.

Our 50k shell we will never be 100% bug for bug compatible with a megabyte-
sized twisty mess of sphagetti, and we shouldn't _try_ to be.  If they really 
need bash they can run bash, what we need to give them is something that's 
good enough for as many people as we can while staying small and simple.

Right now, it's not good enough for most people.  You're proposing that it 
should stop being small and simple.  Neither of those is a good thing.

> Users don't care that much which way is "more correct";
> but they do care a lot about having their scripts run correctly
> even in corner cases when they migrate from bash to, say, hush.

I note that dash still doesn't handle {} at all, and thus can't be used to run 
this bit of LFS:

  http://www.linuxfromscratch.org/lfs/view/6.7/chapter06/creatingdirs.html

Now LFS gets away with that because they build bash in chapter 5, so they know 
what they're running with.  But I make my own chapter 5 equivalent out of an 
Aboriginal Linux root filesystem and I need to build bash 2.05b to get it to 
work.  I cannot use hush for that case, because of the lack of trivial {} 
support.  Not because of some "quoted string stuck in a variable then gets 
expanded in a way that changes the precedence of braces vs wildcards" corner 
case bug, but because the basic functionality isn't htere.

Our shell should be better than dash, but it does NOT need to worry about 
corner cases where Bash itself changes its behavior from version to version.

Instead we need to be able to run things like Portage.  And we can submit 
patches to portage, such as the patch I came up with to make the sucker run 
with bash 3 because it bash stopped needing quotes around a regex with space 
in it that occurrs inside a [ $blah ~= regex with spaces ] context.  (In bash 
3 you needed to quote that, in bash 4 the parser magically knew the regex 
continued.

But the important thing in that context wasn't that bash 4's behavior changed 
in yet another obscure corner case between versions (that happens ALL THE 
TIME), it was specifically that the current ersion of portage wouldn't run.  
And when I asked the portage guys about it they went "oh, that's a regression, 
send us the patch".

Right now we're missing major functionality.  For example, we haven't got 
redirect to/from program, ala:

  diff -u <(zcat file1) <(zcat file2)

Whether or not they have to slightly rephrase their scripts, for the same kind 
of version skew they already have to deal with between bash versions, is a 
separate issue from "there simply is no way to do this without completely 
rewriting your script".

Which means that right now nobody is _trying_ to run serious scripts with the 
busybox shell, thus we're not getting user-directed feedback about what needs 
to be fixed.  Even I am just building bash and ignoring it.  I'd like to stop 
doing that, but taking a "the perfect is the enemy of the good" approach and 
refusing to improve what we've got until it's _interesting_ unless we commit 
to reproducing the full horror of every bloated corner case...  Infrastructure 
in search of a user is bad.  Complicating the code to reproduce _bugs_ in 
another implementation would only be worth doing if significant use cases were 
blocked by that, and right now I'm unaware of _any_ existing scripts blocked 
on that specific bug.  I'm aware of tons blocked on blah/{blah,blah} expansions 
that don't even have wildcards in the same argument, let alone trying to make 
the separate the {precedence 

The BusyBox project is the 80/20 rule personified.  We write the 20% of the 
code that covers 80% of the use cases, _then_ we listen to user feedback about 
specific things they're missing.  Do the simple thing first, only complicate 
when the benefit is shown to significantly outweigh the cost.  Otherwise, you 
might as well just use the gnu bloatware in the first place.

Marvelous video covering the 80/20 rule, among other things:

  http://www.ted.com/talks/clay_shirky_on_institutions_versus_collaboration.html

Rob
-- 
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem 
Forever, and as welcome as New Coke.


More information about the busybox mailing list