libdl usage count wrapping

Vallevand, Mark K Mark.Vallevand at UNISYS.com
Wed Sep 24 15:15:59 UTC 2008


And, here's the patch I used to fix the memory leak.

--- old/ldso/libdl/libdl.c	2008-09-17 08:42:34.000000000 -0500
+++ uClibc/ldso/libdl/libdl.c	2008-09-16 15:14:44.000000000 -0500
@@ -632,6 +632,13 @@
 		}
 	}
 	free(handle->init_fini.init_fini);
+    while ( handle->next )
+    {
+        struct dyn_elf *t;
+        t = handle->next->next;
+        free ( handle->next );
+        handle->next = t;
+    }
 	free(handle);
 


Regards. 
Mark K Vallevand

We old folks have to find our cushions and pillows in our tankards.
Strong beer is the milk of the old.
- Martin Luther


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.


-----Original Message-----
From: uclibc-bounces at uclibc.org [mailto:uclibc-bounces at uclibc.org] On
Behalf Of Vallevand, Mark K
Sent: Wednesday, September 24, 2008 9:53 AM
To: uclibc at uclibc.org
Subject: RE: libdl usage count wrapping

I'd be interested to see if you see memory leaking as this program runs.
I saw a leak of 24 bytes per dlopen() dlclose().  So, after 32k
iterations it should have grown by 750k or more.  A couple of ps
commands should show its growth.

Regards. 
Mark K Vallevand

We old folks have to find our cushions and pillows in our tankards.
Strong beer is the milk of the old.
- Martin Luther


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.


-----Original Message-----
From: Kevin Day [mailto:thekevinday at gmail.com] 
Sent: Tuesday, September 23, 2008 5:05 PM
To: carmelo73 at gmail.com
Cc: Vallevand, Mark K; uclibc at uclibc.org
Subject: Re: libdl usage count wrapping

On Tue, Sep 23, 2008 at 3:06 PM, Carmelo Amoroso <carmelo73 at gmail.com>
wrote:
> I'll look at these two issues soon.
>
> Thanks,
> Carmelo
> Vallevand, Mark K wrote:
>> Wow.  I just ran into this problem.  Or, something very similar.  I
>> reported a memory leak in dlopen() dlclose() last week.  I've got a
fix
>> for that problem, and my program doesn't leak any more.  But, now its
>> crashing consistently after a period of time.  The program makes
heavy
>> use of dlopen() dlclose().
>>
>> Looking at dlopen() dlclose(), I'm probably not going to look for
>> another fix there.  I'm going to fix my program to dlopen() once and
>> leave libraries open.
>>
>> Regards.
>> Mark K Vallevand
>>
>> We old folks have to find our cushions and pillows in our tankards.
>> Strong beer is the milk of the old.
>> - Martin Luther
>>
>>
>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE
PROPRIETARY
>> MATERIAL and is thus for use only by the intended recipient. If you
>> received this in error, please contact the sender and delete the
e-mail
>> and its attachments from all computers.
>>
>>
>> -----Original Message-----
>> From: uclibc-bounces at uclibc.org [mailto:uclibc-bounces at uclibc.org] On
>> Behalf Of Phil Estes
>> Sent: Monday, September 22, 2008 9:45 PM
>> To: uclibc at uclibc.org
>> Subject: libdl usage count wrapping
>>
>> Recently I was looking into an issue where someone was claiming pam
was
>> segfaulting after a lot of usage (lots of calls to authenticate
>> users--usually around 5K calls).  My investigation led me to the
point
>> where I realized that the dlopen() and dlclose() management of ref.
>> counting is not balanced, which leads to the heaviest "DL_NEEDED"
>> libraries basically getting incremented to the point of overflowing
>> "unsigned short usage_count".  This leads to a nasty situation where
>> libc.so is munmapped (because usage_count == 0), and the next call to
a
>> C runtime function traps, of course.
>>
>> Since I was working with 0.9.29 snapshot from 2006, I decided to see
if
>> anything in SVN has changed that might impact this.  I was interested
to
>> find the 17530 changeset and accompanying discussion
>> ( http://uclibc.org/lists/uclibc/2007-January/017165.html ) and while
>> testing it, noted a 10x improvement, given that more of the dependent
>> libs are added to the list that is walked at do_dlclose that includes
a
>> decrement of usage_count.  However, it is still not exact in that
>> libc.so's usage_count continues to rise even with matching dlopen()
and
>> dlclose() calls each iteration through the example program (I'll
attach
>> below), and now needs 65K iterations of loading/unloading a lib to
get a
>> segfault.
>>
>> One way to 'watch' this in gdb is to add a breakpoint in libdl.c
around
>> line 247 or so and use the commands interface to do the following
>> output:
>> commands <brnum>
>> silent
>> printf "%d : %s\n",(*tpnt1)->usage_count, lpntstr
>> cont
>> end
>>
>> Now continue and watch libc's usage_count climb, and if you are
patient
>> enough, you will get a segfault somewhere after 65,500 iterations.
>>
>> Obviously one potential fix is to "handle" the wrap of usage_count in
>> ldso/dl-elf.c by checking for 0 after the increment and setting to
"near
>> max" ..which would then never allow a wrap to zero to occur which
causes
>> the segfault.
>>
>> However, it would be interesting to know if the uClibc maintainers
think
>> usage_count is important enough for some of the core libs to either
(a)
>> be correct, or (b) be protected from the segfault condition which is
>> less likely than it used to be given the aforementioned changes, but
>> still potential for long-running apps (like pam running on a system
with
>> a very long uptime).  I'm slightly interested in trying to fix, but
>> given I'm no expert on all the various lists and pointers employed
via
>> dlopen() it seems like someone with some skill in the area should
make
>> sure it's done right.  My hunch is that is has to do with the
init_fini
>> list creation and the filtering out of RTLD_GLOBAL libs, but I'm not
>> sure yet without more debug...but the init_fini list seems to be what
is
>> walked at dlclose that has any relation to decrementing usage_count.
>>
>> Thanks for any thoughts/input,
>> Phil Estes
>> estesp at linux.vnet.ibm.com
>>
>> Here's my example program for creating the segfault condition..it can
>> obviously be doctored to load different libs, more libs, less libs,
etc.
>>
>> ---stress-dlopen.c----
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <dlfcn.h>
>>
>> #define NUMLIBS 4
>> #define ITERS 50000
>>
>> void *handles[NUMLIBS];
>> /*
>> char *names[NUMLIBS] = { "/lib/security/pam_deny.so",
>>                          "/lib/security/pam_warn.so",
>>                          "/lib/libpam.so.0" };
>> */
>> char *names[NUMLIBS] = { "/usr/lib/libxml2.so.2",
>>                          "/usr/lib/libpcap.so",
>>                          "/usr/lib/libpng.so.2",
>>                          "/usr/lib/libxslt.so.1" };
>>
>> int openlibs()
>> {
>>       int i, errors=0;
>>       for (i = 0; i < NUMLIBS; i++) {
>>               handles[i] = dlopen(names[i], RTLD_NOW);
>>               if (handles[i] == 0) {
>>                       errors++;
>>                       fprintf(stderr, "%s\n", dlerror());
>>               }
>>     }
>>     return errors;
>> }
>>
>> int closelibs()
>> {
>>       int i;
>>       for (i = 0; i < NUMLIBS; i++) {
>>               if (handles[i] != 0) {
>>                       dlclose(handles[i]);
>>                       handles[i] = NULL;
>>               }
>>     }
>>     return 0;
>> }
>>
>> int main()
>> {
>>       unsigned int retcode = 0, i = 0, cnt = ITERS;
>>
>>       for(i=0;i<cnt;i++)
>>       {
>>               printf("\n Call ... %d/%d\n",i,cnt);
>>               retcode = openlibs();
>>               printf("opened %d\n",retcode);
>>               retcode = closelibs();
>>               printf("closed %d\n",retcode);
>>
>>       }
>>       return 0;
>>
>> }
>>
>>
>>
>>
>> _______________________________________________
>> uClibc mailing list
>> uClibc at uclibc.org
>> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>> _______________________________________________
>> uClibc mailing list
>> uClibc at uclibc.org
>> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>>
>
> _______________________________________________
> uClibc mailing list
> uClibc at uclibc.org
> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>

I felt the need to test this on my uClibc 0.9.28.3
and got the following segfault:

 Call ... 32766/50000
File not found
opened 1
closed 0

 Call ... 32767/50000
File not found
opened 1
Segmentation fault (core dumped)

What's particularly interesting is 32767 is just under the magical
32768.
Makes me want to the this is an integer overflow issue.
Possibly a signed integer somewhere, and if this is changed to an
unsigned that would be 65534 (I am assuming)
This would match the 65k issue from 0.9.29 svn.

-- 
Kevin Day



-- 
Kevin Day
_______________________________________________
uClibc mailing list
uClibc at uclibc.org
http://busybox.net/cgi-bin/mailman/listinfo/uclibc



More information about the uClibc mailing list