Regression caused by commit 7682323a3a798d6f15708f228f859a64cb869aa3

Carmelo AMOROSO carmelo.amoroso at st.com
Tue Jan 17 08:41:17 UTC 2012


On 17/01/2012 2.59, Khem Raj wrote:
> On Mon, Jan 16, 2012 at 1:36 AM, Carmelo AMOROSO <carmelo.amoroso at st.com> wrote:
>> On 16/01/2012 9.09, Carmelo Amoroso wrote:
>>> On 16/01/2012 8.53, Khem Raj wrote:
>>>> On Sun, Jan 15, 2012 at 11:46 PM, Carmelo AMOROSO
>>>> <carmelo.amoroso at st.com> wrote:
>>>>> On 15/01/2012 7.22, Khem Raj wrote:
>>>>>> On Sat, Jan 14, 2012 at 6:10 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>>>>>> On Fri, Jan 13, 2012 at 4:13 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>>>>>>> On Fri, Jan 13, 2012 at 3:45 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>>>>>>>> On Fri, Jan 13, 2012 at 1:37 AM, Carmelo AMOROSO <carmelo.amoroso at st.com> wrote:
>>>>>>>>>>> and since I see the same issue on all architectures probably its not
>>>>>>>>>>> elfinterp changes
>>>>>>>>>>> too. Mostly it seems likely that it could be in the way the scopes are
>>>>>>>>>>> being handled
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> we have reviewed several times this change before committing. Anyway we
>>>>>>>>>> will review it again. We have not ever seen any failure in the lookup
>>>>>>>>>> with all of our tests. The only change in the way the symbol scope is
>>>>>>>>>> created is in where the ld.so is added.
>>>>>>>>>> In the original code it was the last entry of the global scope, while
>>>>>>>>>> with the new structure in place it was added as soon as found (as glibc
>>>>>>>>>> actually does).... and I don't really think this could have some impact.
>>>>>>>>>
>>>>>>>>> I tried to reverse it as well but the problem remained.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We are trying to startup a X system on our platform. Is there any simple
>>>>>>>>>> X app we can run to show the failure ?
>>>>>>>>>>
>>>>>>>>>> Is some .so failing to be dl-opened due to unresolved symbol ?
>>>>>>>>>
>>>>>>>>> this is potentially possible. I will try to debug it through
>>>>>>>>
>>>>>>>> This is the problem that happens with the new scoping and does not
>>>>>>>> happen without it
>>>>>>>>
>>>>>>>> Error reading Pango modules file
>>>>>>>>
>>>>>>>> (matchbox-desktop:1058): Pango-CRITICAL **: No modules found:
>>>>>>>> No builtin or dynamically loaded modules were found.
>>>>>>>> PangoFc will not work correctly.
>>>>>>>> This probably means there was an error in the creation of:
>>>>>>>>  '/etc/pango/pango.modules'
>>>>>>>> You should create this file by running:
>>>>>>>>  pango-querymodules > '/etc/pango/pango.modules'
>>>>>>>>
>>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font,
>>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='latin'
>>>>>>>>
>>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font,
>>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='common'
>>>>>>>
>>>>>>> here is the error
>>>>>>>
>>>>>>> /usr/bin/pango-querymodules: can't resolve symbol
>>>>>>> '_ZNSt14error_categoryD2Ev' in lib '/usr/lib/libstdc++.so.6'.
>>>>>>>
>>>>>>> this does not happen without scope patch
>>>>>>>
>>>>>>> pango-querymodules loads a shared library
>>>>>>> /usr/lib/pango/1.6.0/modules/pango-basic-fc.so using dlopen and this
>>>>>>> library had libstdc++.so.6 in its DT_NEEDED entries
>>>>>>>
>>>>>>> I was trying to create a small testcase where I created a small binary
>>>>>>> which would dlopen another .so which has libstdc++ in DT_NEEDED in its
>>>>>>> header so not able to reproduce a small testcase but making some
>>>>>>> progress
>>>>>>
>>>>>>
>>>>>> I might have a test case here http://uclibc.org/~kraj/reproducer_v2.tar.gz
>>>>>> untar it on target and run make and the ./run.sh
>>>>>>
>>>>>> with buggy libraries i get
>>>>>> root at qemux86:~/rep/reproducer_v2# ./run.sh
>>>>>> 1)main:dlopen  libA.so
>>>>>> 4)libC:dlopen  libB.so
>>>>>> 5)libC:atexit(libC_fini)
>>>>>> 6)main:dlclose libA.so
>>>>>> /home/root/rep/reproducer_v2/main: can't resolve symbol '_libC_fini'
>>>>>> in lib './/libC.so'.
>>>>>>
>>>>>> whereas without the scopes patch I get
>>>>>>
>>>>>> root at qemux86:~/rep/reproducer_v2# ./run.sh
>>>>>> 1)main:dlopen  libA.so
>>>>>> 4)libC:dlopen  libB.so
>>>>>> 5)libC:atexit(libC_fini)
>>>>>> 6)main:dlclose libA.so
>>>>>> 7)libC:finish - atexit()
>>>>>> 8)main:finish main
>>>>>> root at qemux86:~/rep/reproducer_v2#
>>>>>>
>>>>>>
>>>>>> I think thats the problem that I am facing in pango-querymodules as well
>>>>>> another data point is if I use BIND_NOW then it works too.
>>>>>>
>>>>>> let me know if you can reproduce it with this testcase
>>>>>>
>>>>>> Thanks
>>>>>> -Khem
>>>>>>
>>>>>
>>>>> Thanks khem for your effort in reproducing.
>>>>> I-ll let you know asap.
>>>>>
>>>>> We will focus on this 100% since now.
>>>>>
>>>>> Carmelo
>>>>
>>>> I have a patch (sort of) which fixes this issue have a look at it.
>>>> Problem is that its trying to unload sub scopes after it has been
>>>> removed from global scope so I just delayed the removal of dlopened
>>>> library
>>>>
>>>
>>> what is triggering the problem is the use of atexit()
>>>
>>
>> I'd  ask.. is it correct that a dlopen-ed shared library install
>> a function via atexit() to be called at program exit, if the shared
>> library could be un-loaded at any time during the program's life ?
>>
> 
> does library know if it will be dlopened all the time ?
> 

no it doesn't obviously.

I've read again atexit man pages, initially it simply refers to the use
of atexit in binaries (so the reason of my doubts), later in the Note
I've read a reference to the use of atexit in shared libraries acting as
a destructor.... so my concerns are invalid.

>> I'd say that with the old it was just working fortunately !
>>
>> The shared library image is actually un-mapped from the system, why we
>> should expect to have some of its symbols still alive ?
>>
> 
> how about the dependencies that it loaded
> 

again I was wrong. Looking at the code more carefully, inded in the loop
where the library (with dependencies) are getting unloaded, the
destructors are called at the beginning, before unmapping the DSO, and
before removing it from the _dl_loaded_modules and the symbol tables...
so it works.

>>>> http://www.uclibc.org/~kraj/fix_libdl.patch
>>>>
>>>
>>> looking at it
>>
>> not considering the concerns on the use of atexit, this patch is
>> correct. Could we avoid to use the unlink_local_scope guard and test the
>> stored_ls pointer directly ?
>>

please install you patch, it is definitely correct.

>> carmelo
> 

cheers,
carmelo


More information about the uClibc mailing list