[PATCH 1/2] find: use sysconf(_SC_ARG_MAX) to determine the command-line size limit

Ralf Friedl Ralf.Friedl at online.de
Mon Jun 23 14:45:37 UTC 2014


Bartosz Gołaszewski wrote:
> 2014-06-22 13:49 GMT+02:00 Denys Vlasenko <vda.linux at googlemail.com>:
>>> +     IF_FEATURE_FIND_EXEC_PLUS(G.max_argv_len = BB_ARG_MAX;) \
>> The "- 2048" part should be here, not inside BB_ARG_MAX
>> (not all users will want that subtraction).
> But then the user would substract these bytes even when using ARG_MAX
> or 32*1024 - when it's not needed as ARG_MAX already includes space
> for the environment.
According to a comment from findutils/xargs.c the 2048 comes from the 
standard that for some reason limits the size of argv+environ to 
ARG_MAX-2048 bytes.

ARG_MAX is the total space available for all arguments. All arguments 
include the environment, so the space needed for the environment is not 
available for argv. Of course the space needed for the environment may 
be smaller or much larger than 2048, so the 2048 is just a hack. The 
correct approach would be to determine the size of the actual 
environment and use that.
The size for the arguments is not defined in the standard: " It is 
implementation-defined whether null terminators, pointers, and/or any 
alignment bytes are included in this total."
Two likely implementations are to pass only the strings and reconstruct 
the pointers to the strings after that, or to pass the strings and the 
pointers (argv and environ). That means each string needs either 
strlen()+1 bytes or strlen()+1+sizeof(char*) bytes, so the latter would 
be safe to assume.
At the moment, busybox xargs completely ignores the environment, so the 
reason that xargs seems to work is that either 2048 is enough for the 
environment (not likely) or that 32*1024 is an upper bound and not a 
lower bound for the argument size:
   n_max_chars = 32 * 1024;
   arg_max = sysconf(_SC_ARG_MAX) - 2048;
   if (arg_max > 0 && n_max_chars > arg_max)
     n_max_chars = arg_max;

Are there any systems in use today where ARG_MAX is smaller than 32kB?

>>> +
>>> +     /* TODO Introduce a growable buffer and use BB_ARG_MAX macro to
>>> +      * determine a safe value for n_max_chars. Remove this check afterwards.
>>> +      */
>> I don't understand this comment.
> Before commit f92f1d0 there was a comment in findutils/xargs.c:
>
> /* -s NUM default. fileutils-4.4.2 uses 128k, but I heasitate
>   * to use such a big value - first need to change code to use
>   * growable buffer instead of fixed one.
>
> The small value of 32*1024 is used here because the command-line
> buffer is allocated in a single kzalloc() call at line 559. This is
> why we do ' n_max_chars = 32 * 1024;'  at line 536 if the value
> returned by bb_arg_max() is greater than 32*1024, and - for now -
> 'n_max_chars -= 2048;' at line 541 is not needed.
>
> If we were to use bb_arg_max() we'd need to use a growable vector just
> like 'find -exec {} +' does right now.
The buffer is allocated with xzalloc and for some reason with size 
(n_max_chars + 1).
Is there a reason to clear the buffer? It is not cleared before reuse 
anyway. So on a MMU machine it would be possible to use a much bigger 
default with xalloc, and the pages would only be mapped as needed. This 
also wouldn't need any code to grow the buffer, and would also save the 
time needed to copy the contents of the buffer on realloc.

As the functionality from "xargs" and "find -exec {} +" is quite 
similar, it might be a good idea to define some functions that can be 
used by both commands.


More information about the busybox mailing list