[uClibc] ldso bug in uclibc?

Andrew de Quincey adq_dvb at lidskialf.net
Mon Nov 8 12:40:20 UTC 2004


On Monday 08 Nov 2004 10:58, Joakim Tjernlund wrote:
> > Hi, now that I have gdb working, I've been tracking down the actual bug
> > that I needed to debug gdb for.
> >
> > Its in the Helix client. Helix consists of a set of shared libraries -
> > plugins. Each plugin .so has a public set of standard C API symbols -
> > CanUnload(), CanUnload2() etc. Each plugin's symbols have the same name.
> >
> > Helix core loads the shared library using libdl and requests these
> > symbols by name. This seems to work fine - if it asks for CanUnload from
> > pluginX.so, it gets the pointer to the method in pluginX.so.
> >
> > There is a cleanup thread that runs that removes unused plugins from
> > memory - If CanUnload() returns 0, it removes it. The problem I'm having
> > is that one of the plugins unloads when it is still in use.
> >
> > This plugin has an implementation of the above two methods as follows:
> >
> > int CanUnload2() {
> >  return CanUnload()
> > }
> >
> > int CanUnload {
> >  -- check stuff and return appropriate value
> > }
> >
> > I've added in tracing, CanUnload2() is called correctly. However, when it
> > calls CanUnload(), the method in a *different* plugin is called - which
> > coincidentally returns 0, causing the one in use to be unloaded. Which
> > doesn't have good results. :)
> >
> > Note that I'm using -fPIC as these are shared libraries on a powerpc 405,
> > so the call to CanUnload from CanUnload2 goes through the PIC.. er table?
> > sorry not sure of the terminology here. I'm guessing the PIC table entry
> > for CanUnload is accidentally being populated with CanUnload from the
> > first .so it loaded.
> >
> > This has to be a bug in uclibc. These plugins have been in Helix for a
> > while, and if they didn't work they would have been fixed ages ago.
> >
> > I'm using a uclibc snapshot from 06.11.2004.
>
> I suspect that Helix does NOT use RTLD_GLOBAL in dlopen(), right?
> libdl cannot handle non RTLD_GLOBAL yet, it will be promoted to
> RTLD_GLOBAL.
>
> The patch "Fix dlsym resolution for powerpc" you just posted to fix this is
> not correct. The problem space for non RTLD_GLOBAL relocation is much more
> complex. You can read more about it here:
> http://www.opengroup.org/onlinepubs/007908799/xsh/dlopen.html

Correct. m_handle = dlopen(dllName, RTLD_LAZY);

Is this not just a matter of having the "struct elf_resolve *tpnt" structures 
store the RTLD_GLOBAL/RTLD_LOCAL state (if they don't already). _dl_find_hash 
then doesn't search a non-local and non-global symbol table if it doesn't 
have RTLD_GLOBAL set. This assumes our "implementation defined behaviour" for 
when neither is set is to pretend RTLD_LOCAL was passed.

_dl_find_hash knows which is the local symbol table 'cos we keep the API 
change I made - passing the tpnt structure for the library concerned - and 
comparing the absolute pathnames in the search loop.



More information about the uClibc mailing list