fast getline for busybox (proof of concept, untested)

Rich Felker dalias at aerifal.cx
Wed May 31 20:55:46 UTC 2006


here is some quick code i whipped up in an attempt to replace
bb_get_chomped_line_from_file et. al. with code that is both portable
and extremely fast. the bb_get_chomped_line_from_file function is
currently the main performance bottleneck for many bb applets due to
repeatedly calling getc rather than using fgets.

the attached proof-of-concept code shows how fgets may be used
repeatedly with realloc to fetch arbitrary-length lines from the
stream, and how it may be used even in the presence of embedded nul
bytes (if anyone cares to support that) with mostly-negligable
performance penalty (assuming the implementation's memset and memchr
are fast).

hopefully the code actually works; it's untested. :)

-rich


p.s. yes the fgets_embedded_nul stuff looks hackish but according to
my reading of iso c it does not depend on any behavior outside of what
iso c specifies for the behavior of fgets. whether there are buggy
implementations that don't work with this, i don't know, but it seems
doubtful since any deviation would almost have to be intentional.


-------------- next part --------------
#include <stdio.h>
#include <string.h>
#include <libbb.h>

size_t fgets_embedded_nul(char *s, size_t n, FILE *f)
{
	char *p, *z;
	memset(s, 1, n);
	if (!fgets(s, n, f)) return 0;
	for (p=s; z = memchr(p, 0, n); n -= ++z-p, p=z);
	return p-s-1;
}

char *bb_getline_embedded_nul(FILE *f, size_t *n)
{
	size_t len = 0, size = 127, l;
	char *buf = bb_xmalloc(size);

	for (;;) {
		if (!(l = fgets_embedded_nul(buf+len, size-len, f))) break;
		if (l < size-len-1 || buf[size-2] == '\n') {
			len += l;
			break;
		} else len = size - 1;
		// safe: SIZE_MAX allocation will fail before overflow
		size = 2*size+1;
		buf = bb_xrealloc(buf, size);
	}

	if (!(*n = len)) {
		free(buf);
		return NULL;
	}

	return buf;
}

char *bb_getline(FILE *f)
{
	size_t len = 0, size = 127;
	char *buf = bb_xmalloc(size);

	for (;;) {
		buf[size-1] = 1;
		if (!fgets(buf+len, size-len, f)) break;
		if (buf[size-1] || buf[size-2] == '\n') {
			len += strlen(buf+len);
			break;
		} else len = size - 1;
		// safe: SIZE_MAX allocation will fail before overflow
		size = 2*size+1;
		buf = bb_xrealloc(buf, size);
	}

	if (!len) {
		free(buf);
		return NULL;
	}

	return buf;
}


More information about the busybox mailing list