Bug in wc.

Rob Landley rob at landley.net
Tue Mar 9 02:56:03 UTC 2010


On Monday 08 March 2010 12:03:03 Bernhard Reutner-Fischer wrote:
> On Sun, Mar 07, 2010 at 06:48:50PM -0600, Rob Landley wrote:
> >The busybox "wc" command doesn't work to build mips in 2.6.33.  Kernel
> > commit
> >
> >VMLINUX_SIZE := $(shell wc -c $(objtree)/$(KBUILD_IMAGE) 2>/dev/null | \
> >	cut -d' ' -f1)
>
> cool stuff. I guess
> VMLINUX_SIZE := $(firstword $(shell wc -c $(objtree)/$(KBUILD_IMAGE)
> >2>/dev/null)) or 'stat -c %s' would have been too simple? Perhaps you can
> suggest this to the kernel folks.

You could suggest it to the kernel folks just as easily.  I have my plate full 
with suggestions that way, and Peter Anvin has already (repeatedly) accused my 
build envioronment of being a one-person experiment of no interest to the rest 
of the world.

> >VMLINUX_SIZE is blank when using busybox tools.
> >
> >The underlying behavioral wonkiness in busybox "cut" is:
> >
> >$ busybox wc -c vmlinux
> >  3335777 vmlinux
> >$ wc -c vmlinux
> >3335777 vmlinux
>
> And yes, that should be fixed too, let's just do away with the space
> offsets alltogether (but that _will_ break folks who | cut -c10- wc of
> course).

http://www.opengroup.org/onlinepubs/9699919799/utilities/wc.html

  By default, the standard output shall contain an entry for each input file of
  the form:

  "%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>

I.E. no leading space, only one space between each thingy, and if that's what 
we do we can hide behind SUSv4. :)

I've already written a toybox version, since it was easier for me to write 
that from scratch than try to wrestle with the busybox one.  The new toybox 
command is 98 lines long (1737 bytes of source) and the existing busybox one 
is 206 lines (4974 bytes of source) in current git.

Plus implementing a new toybox command involves adding one file to one 
directory, and it gets automatically picked up and everything else dynamically 
generated from that by the build.

Here is the new toybox file in its entirety:

/* vi: set sw=4 ts=4:
 *
 * wc.c - word count
 *
 * Copyright 2010 Rob Landley <rob at landley.net>
 *
 * See http://www.opengroup.org/onlinepubs/9699919799/utilities/wc.html

USE_WC(NEWTOY(wc, "mLcwl", TOYFLAG_USR|TOYFLAG_BIN))

config WC
	bool "wc"
	default y
	help
	  Count words, lines, and/or bytes in files or stdin.

	  usage: wc [-clw] [file...]

	  -l	Line count
	  -w	Word count
	  -c	Byte count
	  -L	Longest line
*/

#include "toys.h"

DEFINE_GLOBALS(
	long count[4];
	long lines;
)

#define TT this.wc

static void print_results(long *count, char *name)
{
	int i, space = 0;

	for (i=0; i<4; i++) {
		if (toys.optflags & (1<<i)) {
			if (space++) xputc(' ');
			printf("%ld", count[i]);
		}
		TT.count[i] += count[i];
	}

	if (strcmp("-", name)) printf(" %s", name);
	xputc('\n');
}

static void do_wc(int fd, char *name)
{
	long count[5];	// lwcL plus a count for current L
	int i, len, space=1;

	bzero(count, 5*sizeof(long));

	for (;;) {
		len = read(fd, toybuf, sizeof(toybuf));
		if (len<0) {
			perror_msg("%s",name);
			toys.exitval = EXIT_FAILURE;
		}
		if (len<1) break;

		// Loop through the data
		for (i=0; i<len; i++) {

			// increment c always
			count[2]++;

			// increment w if this is a space but the previous one wasn't.
			if (isspace(toybuf[i])) {
				if (!space) count[1]++;
				space = 1;
			} else space=0;

			if (toybuf[i] == '\n') {
				// Handle l
				(*count)++;
				// Handle L
				if (count[4]>count[3]) count[3]=count[4];
				count[4]=0;
			} else count[4]++;
		}
	}

	// Print out the results

	print_results(count, name);
	TT.lines++;
}

void wc_main(void)
{
	if (!(toys.optflags&15)) toys.optflags = 7;
	loopfiles(toys.optargs, do_wc);
	if (TT.lines>1) print_results(TT.count, "total");
}

To add an applet to busybox, you need to add the actual .c file, and modify 
applets.h, and usage.h, and modify the appropriate Config.in, and modify the 
appropriate Kbuild file, and while we're at it why not touch 
docs/busybox_footer.pod.

I am somewhat curious why my wc says the toybox binary is 2823 words long, the 
gnu wc says it's 2761 words long, and the busybox one says it's 2755 words 
long.  Then again the spec doesn't say _how_ you indicate "word", so...  (I'm 
just using isspace() followed by !isspace(), seemed fairly straightforward...)  
The -c and -l fields are consistent, though.  (Still debugging -L in mine, 
though.)

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds


More information about the busybox mailing list