Regression caused by commit 7682323a3a798d6f15708f228f859a64cb869aa3

Carmelo AMOROSO carmelo.amoroso at st.com
Mon Jan 16 17:53:19 UTC 2012


On 16/01/2012 10.36, Carmelo Amoroso wrote:
> On 16/01/2012 9.09, Carmelo Amoroso wrote:
>> On 16/01/2012 8.53, Khem Raj wrote:
>>> On Sun, Jan 15, 2012 at 11:46 PM, Carmelo AMOROSO
>>> <carmelo.amoroso at st.com> wrote:
>>>> On 15/01/2012 7.22, Khem Raj wrote:
>>>>> On Sat, Jan 14, 2012 at 6:10 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>>>>> On Fri, Jan 13, 2012 at 4:13 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>>>>>> On Fri, Jan 13, 2012 at 3:45 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>>>>>>> On Fri, Jan 13, 2012 at 1:37 AM, Carmelo AMOROSO <carmelo.amoroso at st.com> wrote:
>>>>>>>>>> and since I see the same issue on all architectures probably its not
>>>>>>>>>> elfinterp changes
>>>>>>>>>> too. Mostly it seems likely that it could be in the way the scopes are
>>>>>>>>>> being handled
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> we have reviewed several times this change before committing. Anyway we
>>>>>>>>> will review it again. We have not ever seen any failure in the lookup
>>>>>>>>> with all of our tests. The only change in the way the symbol scope is
>>>>>>>>> created is in where the ld.so is added.
>>>>>>>>> In the original code it was the last entry of the global scope, while
>>>>>>>>> with the new structure in place it was added as soon as found (as glibc
>>>>>>>>> actually does).... and I don't really think this could have some impact.
>>>>>>>>
>>>>>>>> I tried to reverse it as well but the problem remained.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> We are trying to startup a X system on our platform. Is there any simple
>>>>>>>>> X app we can run to show the failure ?
>>>>>>>>>
>>>>>>>>> Is some .so failing to be dl-opened due to unresolved symbol ?
>>>>>>>>
>>>>>>>> this is potentially possible. I will try to debug it through
>>>>>>>
>>>>>>> This is the problem that happens with the new scoping and does not
>>>>>>> happen without it
>>>>>>>
>>>>>>> Error reading Pango modules file
>>>>>>>
>>>>>>> (matchbox-desktop:1058): Pango-CRITICAL **: No modules found:
>>>>>>> No builtin or dynamically loaded modules were found.
>>>>>>> PangoFc will not work correctly.
>>>>>>> This probably means there was an error in the creation of:
>>>>>>>  '/etc/pango/pango.modules'
>>>>>>> You should create this file by running:
>>>>>>>  pango-querymodules > '/etc/pango/pango.modules'
>>>>>>>
>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font,
>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='latin'
>>>>>>>
>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font,
>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='common'
>>>>>>
>>>>>> here is the error
>>>>>>
>>>>>> /usr/bin/pango-querymodules: can't resolve symbol
>>>>>> '_ZNSt14error_categoryD2Ev' in lib '/usr/lib/libstdc++.so.6'.
>>>>>>
>>>>>> this does not happen without scope patch
>>>>>>
>>>>>> pango-querymodules loads a shared library
>>>>>> /usr/lib/pango/1.6.0/modules/pango-basic-fc.so using dlopen and this
>>>>>> library had libstdc++.so.6 in its DT_NEEDED entries
>>>>>>
>>>>>> I was trying to create a small testcase where I created a small binary
>>>>>> which would dlopen another .so which has libstdc++ in DT_NEEDED in its
>>>>>> header so not able to reproduce a small testcase but making some
>>>>>> progress
>>>>>
>>>>>
>>>>> I might have a test case here http://uclibc.org/~kraj/reproducer_v2.tar.gz
>>>>> untar it on target and run make and the ./run.sh
>>>>>
>>>>> with buggy libraries i get
>>>>> root at qemux86:~/rep/reproducer_v2# ./run.sh
>>>>> 1)main:dlopen  libA.so
>>>>> 4)libC:dlopen  libB.so
>>>>> 5)libC:atexit(libC_fini)
>>>>> 6)main:dlclose libA.so
>>>>> /home/root/rep/reproducer_v2/main: can't resolve symbol '_libC_fini'
>>>>> in lib './/libC.so'.
>>>>>
>>>>> whereas without the scopes patch I get
>>>>>
>>>>> root at qemux86:~/rep/reproducer_v2# ./run.sh
>>>>> 1)main:dlopen  libA.so
>>>>> 4)libC:dlopen  libB.so
>>>>> 5)libC:atexit(libC_fini)
>>>>> 6)main:dlclose libA.so
>>>>> 7)libC:finish - atexit()
>>>>> 8)main:finish main
>>>>> root at qemux86:~/rep/reproducer_v2#
>>>>>
>>>>>
>>>>> I think thats the problem that I am facing in pango-querymodules as well
>>>>> another data point is if I use BIND_NOW then it works too.
>>>>>
>>>>> let me know if you can reproduce it with this testcase
>>>>>
>>>>> Thanks
>>>>> -Khem
>>>>>
>>>>
>>>> Thanks khem for your effort in reproducing.
>>>> I-ll let you know asap.
>>>>
>>>> We will focus on this 100% since now.
>>>>
>>>> Carmelo
>>>
>>> I have a patch (sort of) which fixes this issue have a look at it.
>>> Problem is that its trying to unload sub scopes after it has been
>>> removed from global scope so I just delayed the removal of dlopened
>>> library
>>>
>>
>> what is triggering the problem is the use of atexit()
>>
> 
> I'd  ask.. is it correct that a dlopen-ed shared library install
> a function via atexit() to be called at program exit, if the shared
> library could be un-loaded at any time during the program's life ?
> 
> I'd say that with the old it was just working fortunately !
> 
> The shared library image is actually un-mapped from the system, why we
> should expect to have some of its symbols still alive ?
> 
>>> http://www.uclibc.org/~kraj/fix_libdl.patch
>>>
>>
>> looking at it
> 
> not considering the concerns on the use of atexit, this patch is
> correct. Could we avoid to use the unlink_local_scope guard and test the
> stored_ls pointer directly ?
> 
> carmelo

hum.... still wondering how original lookup mechanism worked !?



More information about the uClibc mailing list