resolver broken in NPTL build
Timo Teräs
timo.teras at iki.fi
Sun May 16 15:44:35 UTC 2010
On 05/16/2010 07:00 AM, Denys Vlasenko wrote:
> On Wednesday 12 May 2010 09:32, Timo Teräs wrote:
>> The combination of
>>
>> commit aab4df0fb51660300559f5f29290709db2f7bfee
>> resolv.c: add support for per thread res_state
>>
>> commit cca45baf8353d1e338d232f5bdb2d1d6b357f1da
>> /etc/resolv.conf: support "timeout:n" and "attempts:n" options
>>
>> .. and NPTL results in broken resolver in very annoying ways.
>>
>> Now, it seems that most of the uclibc code does not work well
>> if res_state is TLS variable. Technically, this is the correct thing
>> to do since this gives proper per-thread resolving behavior, and
>> it also makes the config options overridable per thread. This
>> probably what apps expect as glibc does it too.
>
> Do you have an example where per-threadness is used?
I cannot really give any example immediately. But I still think
we should stick to the per-threaded behavior. That is what the
specifications say, and if we want to be specs compliant, _res
needs to be in TLS.
>> But alas, most places use _res to sync up static global variables
>> which results in breakage. It gets more or less randomly selected
>> which threads options get applied. Also in case of multiple servers
>> it looks like the retry logic is shared between all threads, e.g.
>> two concurrent resolutions can make other resolvers skip nameservers
>> due to shared "last_ns_num".
>
> I see.
But still having the last_ns_num et al. as global, shared vars
would probably be okay.
>> And finally the timeout/attempts commit breaks the accumulated stuff
>> horribly. What happens is:
>> 1. multithreaded application startups, initializes resolver,
>> resolves things just fine
>> 2. resolv.conf gets changed, application calls res_init
>> after res_init uclibc will call res_sync on all resolver functions
>> to refresh globals from the TLS variable _res
>> 3. res_init was called only in one thread, so other thread's
>> _res contains all zeroes (yes, this is correct app usage:
>> res_init should be called only from one thread)
>> 4. threads not calling res_init get broken resolver due to timeout
>> being set to zero
>>
>> Now, one proper solution would be to:
>> 1. make __open_nameservers return the configuration options
>> 2. pass the config options struct to res_sync_func so it can do
>> the proper overrides from per-thread _res
>> 3. remove the related globals and use locally config options from
>> __open_nameservers
>>
>> But technically, the correct thing (as in glibc does this) is:
>> - res_init increases global "res_init timestamp"
>> - use _res (or pointer within there) for resolver retries etc.
>> get proper per-thread behavior
>> - resolvers functions call "maybe_init" which reinitialize the
>> TLS'ed resolver state (e.g. reload resolv.conf) if their res_init
>> timestamp is out dated
>
> It looks very complex. Can we just use a shared structure?
> That will be simple and small.
>
> Anyone objects to this?
See above. I object having _res a global. Having the other stuff
shared is ok.
>> As an immediate emergency kludge fix, the following might do:
>>
>> resolv: fix options handling for TLS _res
>>
>> If _res is in TLS (NPTL), it might not get initialized. Assume
>> zeroes mean default values.
>>
>> Signed-off-by: Timo Teräs <timo.teras at iki.fi>
>>
>> diff --git a/libc/inet/resolv.c b/libc/inet/resolv.c
>> index 320aec4..f8066d2 100644
>> --- a/libc/inet/resolv.c
>> +++ b/libc/inet/resolv.c
>> @@ -2916,8 +2916,8 @@ static void res_sync_func(void)
>> __nameserver[n].sa4 = rp->nsaddr_list[n]; /*
>> struct copy
>> #endif
>> }
>> - __resolv_timeout = rp->retrans;
>> - __resolv_attempts = rp->retry;
>> + __resolv_timeout = rp->retrans ?: RES_TIMEOUT;
>> + __resolv_attempts = rp->retry ?: RES_DFLRETRY;
>> /* Extend and comment what program is known
>> * to use which _res.XXX member(s).
>
> Comment why this is needed might be good.
Isn't the commit log description enough?
- Timo
More information about the uClibc
mailing list