Bug in wc.
Rob Landley
rob at landley.net
Tue Mar 9 02:56:03 UTC 2010
On Monday 08 March 2010 12:03:03 Bernhard Reutner-Fischer wrote:
> On Sun, Mar 07, 2010 at 06:48:50PM -0600, Rob Landley wrote:
> >The busybox "wc" command doesn't work to build mips in 2.6.33. Kernel
> > commit
> >
> >VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
> > cut -d' ' -f1)
>
> cool stuff. I guess
> VMLINUX_SIZE := $(firstword $(shell wc -c $(objtree)/$(KBUILD_IMAGE)
> >2>/dev/null)) or 'stat -c %s' would have been too simple? Perhaps you can
> suggest this to the kernel folks.
You could suggest it to the kernel folks just as easily. I have my plate full
with suggestions that way, and Peter Anvin has already (repeatedly) accused my
build envioronment of being a one-person experiment of no interest to the rest
of the world.
> >VMLINUX_SIZE is blank when using busybox tools.
> >
> >The underlying behavioral wonkiness in busybox "cut" is:
> >
> >$ busybox wc -c vmlinux
> > 3335777 vmlinux
> >$ wc -c vmlinux
> >3335777 vmlinux
>
> And yes, that should be fixed too, let's just do away with the space
> offsets alltogether (but that _will_ break folks who | cut -c10- wc of
> course).
http://www.opengroup.org/onlinepubs/9699919799/utilities/wc.html
By default, the standard output shall contain an entry for each input file of
the form:
"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
I.E. no leading space, only one space between each thingy, and if that's what
we do we can hide behind SUSv4. :)
I've already written a toybox version, since it was easier for me to write
that from scratch than try to wrestle with the busybox one. The new toybox
command is 98 lines long (1737 bytes of source) and the existing busybox one
is 206 lines (4974 bytes of source) in current git.
Plus implementing a new toybox command involves adding one file to one
directory, and it gets automatically picked up and everything else dynamically
generated from that by the build.
Here is the new toybox file in its entirety:
/* vi: set sw=4 ts=4:
*
* wc.c - word count
*
* Copyright 2010 Rob Landley <rob at landley.net>
*
* See http://www.opengroup.org/onlinepubs/9699919799/utilities/wc.html
USE_WC(NEWTOY(wc, "mLcwl", TOYFLAG_USR|TOYFLAG_BIN))
config WC
bool "wc"
default y
help
Count words, lines, and/or bytes in files or stdin.
usage: wc [-clw] [file...]
-l Line count
-w Word count
-c Byte count
-L Longest line
*/
#include "toys.h"
DEFINE_GLOBALS(
long count[4];
long lines;
)
#define TT this.wc
static void print_results(long *count, char *name)
{
int i, space = 0;
for (i=0; i<4; i++) {
if (toys.optflags & (1<<i)) {
if (space++) xputc(' ');
printf("%ld", count[i]);
}
TT.count[i] += count[i];
}
if (strcmp("-", name)) printf(" %s", name);
xputc('\n');
}
static void do_wc(int fd, char *name)
{
long count[5]; // lwcL plus a count for current L
int i, len, space=1;
bzero(count, 5*sizeof(long));
for (;;) {
len = read(fd, toybuf, sizeof(toybuf));
if (len<0) {
perror_msg("%s",name);
toys.exitval = EXIT_FAILURE;
}
if (len<1) break;
// Loop through the data
for (i=0; i<len; i++) {
// increment c always
count[2]++;
// increment w if this is a space but the previous one wasn't.
if (isspace(toybuf[i])) {
if (!space) count[1]++;
space = 1;
} else space=0;
if (toybuf[i] == '\n') {
// Handle l
(*count)++;
// Handle L
if (count[4]>count[3]) count[3]=count[4];
count[4]=0;
} else count[4]++;
}
}
// Print out the results
print_results(count, name);
TT.lines++;
}
void wc_main(void)
{
if (!(toys.optflags&15)) toys.optflags = 7;
loopfiles(toys.optargs, do_wc);
if (TT.lines>1) print_results(TT.count, "total");
}
To add an applet to busybox, you need to add the actual .c file, and modify
applets.h, and usage.h, and modify the appropriate Config.in, and modify the
appropriate Kbuild file, and while we're at it why not touch
docs/busybox_footer.pod.
I am somewhat curious why my wc says the toybox binary is 2823 words long, the
gnu wc says it's 2761 words long, and the busybox one says it's 2755 words
long. Then again the spec doesn't say _how_ you indicate "word", so... (I'm
just using isspace() followed by !isspace(), seemed fairly straightforward...)
The -c and -l fields are consistent, though. (Still debugging -L in mine,
though.)
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
More information about the busybox
mailing list