Bug in wc.

Rob Landley rob at landley.net
Mon Mar 8 18:54:05 UTC 2010


On Sunday 07 March 2010 21:07:00 Denys Vlasenko wrote:
> On Monday 08 March 2010 01:48, Rob Landley wrote:
> > And VMLINUZ_SIZE is:
> >
> > VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
> > 	cut -d' ' -f1)
> >
> > VMLINUX_SIZE is blank when using busybox tools.
> >
> > The underlying behavioral wonkiness in busybox "cut" is:
> >
> > $ busybox wc -c vmlinux
> >   3335777 vmlinux
> > $ wc -c vmlinux
> > 3335777 vmlinux
> >
> > Note that we have leading whitespace, the gnu version doesn't.  This
> > leading whitespace is confusing the kernel build, because the cut -d' '
> > then triggers on our leading whitespace and produces an empty string,
> > which propogates through the rest of the build to confuse the linker with
> > a start address of "0x".
> >
> > Why do we have unnecessary leading whitespace?  What happend to small and
> > simple and doing no more than absolutely necessary?
>
> Good question, I'm redirecting it to author of busybox-1.2.1 (or earlier)
> since 1.2.1 displays the same behavior. ;)

Actually, I don't remember ever touching wc.  (I'm going to have to fight with 
git, aren't I? I really hate git's UI...)

It looks like wc was completely rewritten to make it much more complicated in 
cad5364599e back in 2003, and it was essentially untouched (if you don't count 
removing trailing whitespace and tweaking the GPL boilerplate) until you added 
a special case in 2006:

commit 3ed001ff2631ad6911096148f47a2719a5b6d4f4
Author: Denis Vlasenko <vda.linux at googlemail.com>
Date:   Fri Sep 29 23:41:04 2006 +0000

    wc: reduce source cruft, make it so that "wc -c" (one option, no filenames
    will not print leading blanks.

Which would have addressed this problem (and prevented the mips 2.6.33 kernel 
build from breaking) if it wasn't a special case.  This cleanup seems to have 
added complexity rather than removing it.

But really, I don't care so much why it's doing what it's doing now as how to 
fix it.  It looks like what the other wc is doing is holding all output to the 
end and calculating the longest string, and prepending spaces for that.  The 
pathological case is (in the current busybox source):

$ wc -c INSTALL README AUTHORS
 5833 INSTALL
 8768 README
 5171 AUTHORS
19772 total

Meaning it has to know all lines before it outputs any, which is really not 
busybox's style.

And which really isn't all that _interesting_, to be honest.  I'd be pretty 
happy if we never prepended the space and tried to line up the columns at all.  
The longest number will _never_ have prepended space, so anything that tries 
to parse this multi-column output must deal with the no leading whitespace, 
and must therefore treat leading whitespace as _optional_ rather than 
required.

Where we're getting bit is programs depending on the longest number not having 
any prepended whitespace.  Meaning when there's one number output, it's the 
longest number, therefore there should never be prepended whitespace to it.

In general it seems to me that the busybox approach to this whole issue would 
be either:

A) skip the optional behavior to save the space and complexity.
B) Make the column alignment code a config option and shared with ls -l intead 
of hand rolled.

I lean towards A myself...

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds


More information about the busybox mailing list