rewriting config_*, xmalloc_fget*

Timo Teräs timo.teras at iki.fi
Thu Jun 16 04:51:55 UTC 2011


On 06/16/2011 04:51 AM, Rich Felker wrote:
> On Wed, Jun 15, 2011 at 04:47:39PM +0300, Timo Teräs wrote:
>>> In any case, the difference is *very* small at least on x86 (including
>>> SMP/multicore). The cost of the "lock xchg" vs "if (threads>1)" is
>>> something like 50 cycles, and if I remember right, making my
>>> implementation always perform locking only slowed it down by about
>>> 20%. (I say only because this translates into very small difference in
>>> total program time unless your program is "for (;;) free(malloc(1));")
>>>
>>> Perhaps the cost is much larger on other archs where atomics are more
>>> expensive...
>>
>> The figures depend on the specific application.
>>
>> But still, why do malloc() ... free() in relatively tight inner loop, if
>> there's a clean way to avoid it? I don't see point doing "fast things"
>> when we don't have to do them at all. When reading 100.000 or a million
>> lines file, avoiding that many malloc/free calls (or more) is visible on
>> execution time.
> 
> Presumably because you don't know in advance the maximum line size you
> might need. I doubt this really matters for reading config files, but
> I also doubt that I'm qualified to make that decision for a project
> I'm not actually a developer on..
> 
> By the way, getdelim normally can reuse an existing buffer as long as
> you keep track of the size of it; it only has to reallocate if it
> needs more space.

Uh, yes! That's exactly how I wanted to change config_* API. So that it
automatically allocates/reallocs the buffer, but can reuse previously
allocated line buffer. Just what getdelim does. The current config_* API
does not allow that.

>> Though, in bb config_* API, getdelim isn't always enough. They seem to
>> need a special line reader that treats both \0 and \n as line terminator.
> 
> Why? \0 is not valid in text files and certainly shouldn't appear in
> config files... What is it needed for?

Well, that's the way they just work upstream, and you get regression if
you try to change it. I agree on that most applets do not need it, and
might actually be sort of broken for treating \0 as new line now.

Notably, though, at least sed.c seems to rely on the fact that \n and \0
are both recognized, and jumps through many hoops to handle them both
properly. Apparently there's out some sed scripts that use both \n and
\0 as statement separator, but they have different meaning.

Not sure if there's other applets relying on this.

But just like I told before, this also reason why the API needs rewrite,
it should *not* treat both as line separators, unless there's some
explicit need.


More information about the busybox mailing list