Bug in wc.
Rob Landley
rob at landley.net
Mon Mar 8 18:54:05 UTC 2010
On Sunday 07 March 2010 21:07:00 Denys Vlasenko wrote:
> On Monday 08 March 2010 01:48, Rob Landley wrote:
> > And VMLINUZ_SIZE is:
> >
> > VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
> > cut -d' ' -f1)
> >
> > VMLINUX_SIZE is blank when using busybox tools.
> >
> > The underlying behavioral wonkiness in busybox "cut" is:
> >
> > $ busybox wc -c vmlinux
> > 3335777 vmlinux
> > $ wc -c vmlinux
> > 3335777 vmlinux
> >
> > Note that we have leading whitespace, the gnu version doesn't. This
> > leading whitespace is confusing the kernel build, because the cut -d' '
> > then triggers on our leading whitespace and produces an empty string,
> > which propogates through the rest of the build to confuse the linker with
> > a start address of "0x".
> >
> > Why do we have unnecessary leading whitespace? What happend to small and
> > simple and doing no more than absolutely necessary?
>
> Good question, I'm redirecting it to author of busybox-1.2.1 (or earlier)
> since 1.2.1 displays the same behavior. ;)
Actually, I don't remember ever touching wc. (I'm going to have to fight with
git, aren't I? I really hate git's UI...)
It looks like wc was completely rewritten to make it much more complicated in
cad5364599e back in 2003, and it was essentially untouched (if you don't count
removing trailing whitespace and tweaking the GPL boilerplate) until you added
a special case in 2006:
commit 3ed001ff2631ad6911096148f47a2719a5b6d4f4
Author: Denis Vlasenko <vda.linux at googlemail.com>
Date: Fri Sep 29 23:41:04 2006 +0000
wc: reduce source cruft, make it so that "wc -c" (one option, no filenames
will not print leading blanks.
Which would have addressed this problem (and prevented the mips 2.6.33 kernel
build from breaking) if it wasn't a special case. This cleanup seems to have
added complexity rather than removing it.
But really, I don't care so much why it's doing what it's doing now as how to
fix it. It looks like what the other wc is doing is holding all output to the
end and calculating the longest string, and prepending spaces for that. The
pathological case is (in the current busybox source):
$ wc -c INSTALL README AUTHORS
5833 INSTALL
8768 README
5171 AUTHORS
19772 total
Meaning it has to know all lines before it outputs any, which is really not
busybox's style.
And which really isn't all that _interesting_, to be honest. I'd be pretty
happy if we never prepended the space and tried to line up the columns at all.
The longest number will _never_ have prepended space, so anything that tries
to parse this multi-column output must deal with the no leading whitespace,
and must therefore treat leading whitespace as _optional_ rather than
required.
Where we're getting bit is programs depending on the longest number not having
any prepended whitespace. Meaning when there's one number output, it's the
longest number, therefore there should never be prepended whitespace to it.
In general it seems to me that the busybox approach to this whole issue would
be either:
A) skip the optional behavior to save the space and complexity.
B) Make the column alignment code a config option and shared with ls -l intead
of hand rolled.
I lean towards A myself...
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
More information about the busybox
mailing list