Bugs in defconfig /bin/sh
Rob Landley
rob at landley.net
Sun Oct 3 00:35:50 UTC 2010
On Thursday 30 September 2010 17:15:44 Denys Vlasenko wrote:
> On Thursday 30 September 2010 23:29, Rob Landley wrote:
> > > > (Making {file,file} curly bracket support work would be darn nice
> > > > too, that's the biggest thing I miss. You can't even follow the
> > > > Linux From Scratch build instructions without that...)
> > >
> > > This is not easy. Consider three commands:
> > >
> > > v='/bin/*'; echo $v
> > > v='/bin/{a,b}*'; echo $v
> > > echo /bin/{a,b}*
> > >
> > > In the first case, unquoted $v is globbed *after* $v value is
> > > substituted, and echo prints long list of filenames in /bin.
> > >
> > > In the second case, echo prints just "/bin/{a,b}*"
> > >
> > > In the third case, echo prints all filenames in /bin which start from a
> > > and b.
> >
> > That's messed up, and it really sounds like the second case is a bug.
> >
> > I just fired up an Aboriginal system image with bash 2.05b (the much less
> > bloated version), and under that the third case _also_ prints
> > "/bin/{a,b}*", so the behavior of this corner case changed between bash
> > versions. Most likely somebody reported the third case as a bug and they
> > fixed the one defect report without fixing the general case, because
> > they're the FSF and everything is a special case to them.
> >
> > The _important_ test is just:
> >
> > v='/bin/*'; echo $v
> >
> > Which answers the simple question of glob precedence: does it happen
> > before or after the variable substitution? And if the answer is "after"
> > (which it is), then that's what we should do. Perform globbing after
> > substituting variables. {a,b} is part of globbing the same way * and ?
> > and [x-y] and such are.
> >
> > The fact that bash is a buggy pile of crap that can't even consistently
> > implement its own extensions is not our problem. We should do the simple
> > thing and wait for the FSF to catch up. (Specifically, wait for somebody
> > replying on this _obvious_bug_ to complain, and then tell them to
> > complain to the bash guys and make sure they don't _fix_ their
> > inconsistency first.)
>
> We can't do this.
In this context, does "can't" mean that the laws of physics don't allow it,
it's illegal in this jurisdiction, that it is beyond our capabilities, or that
you don't consider it a viable course of action?
> Regardless of what is correct or not (and in this case,
> it is debatable - it can easily be argued that brace expansion
> and glob expansion are two different things),
Wildcards can be inside the curly brackets as easily as outside:
blah/{o?e,two.*}/*.txt
So either resolution happens at the same time, or you make multiple resolution
passes in which case keeping track of what was quoted and what wasn't (and
what was already expanded and what wasn't) is INSANE.
Beyond that, there's no purpose to hte behavioral differences. They were
clearly incidental, rather than designed. Making these two cases differ in
behavior:
v='/bin/{a,b}*'; echo $v
echo /bin/{a,b}*
When these two _don't_ differ in behavior:
v='/bin/*'; echo $v
echo /bin/*
It serves no purpose. It's too subtle for users to keep track of and try to
intentionally use in their own programs, it can only come up
_unintentionally_. It is a clear bug, and the fact the behavior differs from
bash version to bash version (to act _more_ like the simple wildcard case) is
evidence in favor of that position.
> if we want to have more users,
More than we have right now, when we don't support {} at _all_ even in the
trivial completely unquoted cases with no wildcards present?
If we want more users, then we need to run their scripts. Don't worry about
what bash does or doesn't do in made-up examples, show me scripts people are
currently using which don't run under hush. And the only way to do that is
approximate bash's behavior closely enough that people _try_ to run those
scripts, meaning adding trivial {} support and then seeing where it's
insufficient and needs to be expanded.
> we should be compatible with bash (latest one, if older ones
> are different).
You're suggesting a need to be bug-for-bug compatible with something that
changes its behavior every version. That's not a viable course of action.
The fact that the latest ones differ in behavior from the earlier ones shows
that there was no actual _intent_ behind this behavior. And the fact that
Ubuntu managed to shove the Defective Annoying SHell down people's throats
kind of makes this entire nitpicking argument seem ludicruous in context.
Our 50k shell we will never be 100% bug for bug compatible with a megabyte-
sized twisty mess of sphagetti, and we shouldn't _try_ to be. If they really
need bash they can run bash, what we need to give them is something that's
good enough for as many people as we can while staying small and simple.
Right now, it's not good enough for most people. You're proposing that it
should stop being small and simple. Neither of those is a good thing.
> Users don't care that much which way is "more correct";
> but they do care a lot about having their scripts run correctly
> even in corner cases when they migrate from bash to, say, hush.
I note that dash still doesn't handle {} at all, and thus can't be used to run
this bit of LFS:
http://www.linuxfromscratch.org/lfs/view/6.7/chapter06/creatingdirs.html
Now LFS gets away with that because they build bash in chapter 5, so they know
what they're running with. But I make my own chapter 5 equivalent out of an
Aboriginal Linux root filesystem and I need to build bash 2.05b to get it to
work. I cannot use hush for that case, because of the lack of trivial {}
support. Not because of some "quoted string stuck in a variable then gets
expanded in a way that changes the precedence of braces vs wildcards" corner
case bug, but because the basic functionality isn't htere.
Our shell should be better than dash, but it does NOT need to worry about
corner cases where Bash itself changes its behavior from version to version.
Instead we need to be able to run things like Portage. And we can submit
patches to portage, such as the patch I came up with to make the sucker run
with bash 3 because it bash stopped needing quotes around a regex with space
in it that occurrs inside a [ $blah ~= regex with spaces ] context. (In bash
3 you needed to quote that, in bash 4 the parser magically knew the regex
continued.
But the important thing in that context wasn't that bash 4's behavior changed
in yet another obscure corner case between versions (that happens ALL THE
TIME), it was specifically that the current ersion of portage wouldn't run.
And when I asked the portage guys about it they went "oh, that's a regression,
send us the patch".
Right now we're missing major functionality. For example, we haven't got
redirect to/from program, ala:
diff -u <(zcat file1) <(zcat file2)
Whether or not they have to slightly rephrase their scripts, for the same kind
of version skew they already have to deal with between bash versions, is a
separate issue from "there simply is no way to do this without completely
rewriting your script".
Which means that right now nobody is _trying_ to run serious scripts with the
busybox shell, thus we're not getting user-directed feedback about what needs
to be fixed. Even I am just building bash and ignoring it. I'd like to stop
doing that, but taking a "the perfect is the enemy of the good" approach and
refusing to improve what we've got until it's _interesting_ unless we commit
to reproducing the full horror of every bloated corner case... Infrastructure
in search of a user is bad. Complicating the code to reproduce _bugs_ in
another implementation would only be worth doing if significant use cases were
blocked by that, and right now I'm unaware of _any_ existing scripts blocked
on that specific bug. I'm aware of tons blocked on blah/{blah,blah} expansions
that don't even have wildcards in the same argument, let alone trying to make
the separate the {precedence
The BusyBox project is the 80/20 rule personified. We write the 20% of the
code that covers 80% of the use cases, _then_ we listen to user feedback about
specific things they're missing. Do the simple thing first, only complicate
when the benefit is shown to significantly outweigh the cost. Otherwise, you
might as well just use the gnu bloatware in the first place.
Marvelous video covering the 80/20 rule, among other things:
http://www.ted.com/talks/clay_shirky_on_institutions_versus_collaboration.html
Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
More information about the busybox
mailing list