libdl usage count wrapping

Thu Sep 25 21:17:10 UTC 2008

This patch was intended to fix a memory leak, not the count overflow.
We still have some problem in that area.  We resolved it by not calling
dlopen() dlclose() but rather calling dlopen() just once and reusing the
handle.

Regards. 
Mark K Vallevand

We old folks have to find our cushions and pillows in our tankards.
Strong beer is the milk of the old.
- Martin Luther

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.

-----Original Message-----
From: Kevin Day [mailto:thekevinday at gmail.com] 
Sent: Thursday, September 25, 2008 4:11 PM
To: Vallevand, Mark K
Cc: uclibc at uclibc.org
Subject: Re: libdl usage count wrapping

On Wed, Sep 24, 2008 at 10:15 AM, Vallevand, Mark K
<Mark.Vallevand at unisys.com> wrote:
> And, here's the patch I used to fix the memory leak.
>
> --- old/ldso/libdl/libdl.c      2008-09-17 08:42:34.000000000 -0500
> +++ uClibc/ldso/libdl/libdl.c   2008-09-16 15:14:44.000000000 -0500
> @@ -632,6 +632,13 @@
>                }
>        }
>        free(handle->init_fini.init_fini);
> +    while ( handle->next )
> +    {
> +        struct dyn_elf *t;
> +        t = handle->next->next;
> +        free ( handle->next );
> +        handle->next = t;
> +    }
>        free(handle);
>
>
>
> Regards.
> Mark K Vallevand

I've tried this patch on my system and problem persists.
Again, I am using uClibc 0.9.28.3 and that may mean additional
problems exist outside of this.
I also updated my tests to dlload libraries that are on my system and
then ran the program from within valgrind.
My tests are against a live system and I did not have debugging in
uClibc enabled (which I intend rebuild one that does)

My results:

 Call ... 6552/50000
opened 0
closed 0

 Call ... 6553/50000
opened 0
==1755== Jump to the invalid address stated on the next line
==1755==    at 0x4071870: ???
==1755==    by 0x80484DF: closelibs (stress-dlopen.c:42)
==1755==    by 0x8048566: main (stress-dlopen.c:59)
==1755==  Address 0x4071870 is not stack'd, malloc'd or (recently)
free'd
==1755==
==1755== Process terminating with default action of signal 11 (SIGSEGV):
dumping
 core
==1755==  Access not within mapped region at address 0x4071870
==1755==    at 0x4071870: ???
==1755==    by 0x80484DF: closelibs (stress-dlopen.c:42)
==1755==    by 0x8048566: main (stress-dlopen.c:59)
==1755==
==1755== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1755== malloc/free: in use at exit: 332 bytes in 8 blocks.
==1755== malloc/free: 275,276 allocs, 275,268 frees, 15,906,926 bytes
allocated.
==1755== For counts of detected errors, rerun with: -v
==1755== searching for pointers to 8 not-freed blocks.
==1755== checked 38,064 bytes.
==1755==
==1755==
==1755== 12 bytes in 1 blocks are still reachable in loss record 1 of 4
==1755==    at 0x40197C8: malloc (in
/toolchain/lib/valgrind/x86-linux/vgpreload
_memcheck.so)
==1755==    by 0x401F1CB: dlopen (in /lib/libdl-0.9.28.so)
==1755==    by 0x8048467: openlibs (stress-dlopen.c:28)
==1755==    by 0x804854B: main (stress-dlopen.c:57)
==1755==
==1755==
==1755== 24 bytes in 1 blocks are still reachable in loss record 2 of 4
==1755==    at 0x40197C8: malloc (in
/toolchain/lib/valgrind/x86-linux/vgpreload
_memcheck.so)
==1755==    by 0x401EF99: dlopen (in /lib/libdl-0.9.28.so)
==1755==    by 0x8048467: openlibs (stress-dlopen.c:28)
==1755==    by 0x804854B: main (stress-dlopen.c:57)
==1755==
==1755==
==1755== 24 bytes in 3 blocks are still reachable in loss record 3 of 4
==1755==    at 0x40197C8: malloc (in
/toolchain/lib/valgrind/x86-linux/vgpreload
_memcheck.so)
==1755==    by 0x401F21D: dlopen (in /lib/libdl-0.9.28.so)
==1755==    by 0x8048467: openlibs (stress-dlopen.c:28)
==1755==    by 0x804854B: main (stress-dlopen.c:57)
==1755==
==1755==
==1755== 272 bytes in 3 blocks are still reachable in loss record 4 of 4
==1755==    at 0x40197C8: malloc (in
/toolchain/lib/valgrind/x86-linux/vgpreload
_memcheck.so)
==1755==    by 0x40014D1: _dl_malloc (in /lib/ld-uClibc-0.9.28.so)
==1755==    by 0x400168C: _dl_add_elf_hash_table (in
/lib/ld-uClibc-0.9.28.so)
==1755==    by 0x4002974: _dl_load_elf_shared_library (in
/lib/ld-uClibc-0.9.28.
so)
==1755==    by 0x4002E25: _dl_load_shared_library (in
/lib/ld-uClibc-0.9.28.so)
==1755==    by 0x401F130: dlopen (in /lib/libdl-0.9.28.so)
==1755==    by 0x8048467: openlibs (stress-dlopen.c:28)
==1755==    by 0x804854B: main (stress-dlopen.c:57)
==1755==
==1755== LEAK SUMMARY:
==1755==    definitely lost: 0 bytes in 0 blocks.
==1755==      possibly lost: 0 bytes in 0 blocks.
==1755==    still reachable: 332 bytes in 8 blocks.
==1755==         suppressed: 0 bytes in 0 blocks.
Segmentation fault

Interestingly enough, when I enable (partially incomplete) ssp
protection: -fstack-protector-all -lssp, the problem happens here:
 Call ... 7280/50000
opened 0
closed 0

 Call ... 7281/50000
opened 0
Segmentation fault (core dumped)

Using this particular executable with gdb, i get the following:

 Call ... 7280/50000
opened 0
closed 0

 Call ... 7281/50000
opened 0

Program received signal SIGSEGV, Segmentation fault.
0xb7ef3870 in free () from /lib/libc.so.0

As a quick fix, perhaps a free needs to be wrapped by an if: if
(variable) free(variable);

That leaves the real problem, which specific free is failing and then
why is 'whatever is being freed' unallocated.
Is this a double free, or was the address never allocated for some
reason?

-- 
Kevin Day