libdl usage count wrapping
Vallevand, Mark K
Mark.Vallevand at UNISYS.com
Wed Sep 24 15:15:59 UTC 2008
And, here's the patch I used to fix the memory leak.
--- old/ldso/libdl/libdl.c 2008-09-17 08:42:34.000000000 -0500
+++ uClibc/ldso/libdl/libdl.c 2008-09-16 15:14:44.000000000 -0500
@@ -632,6 +632,13 @@
}
}
free(handle->init_fini.init_fini);
+ while ( handle->next )
+ {
+ struct dyn_elf *t;
+ t = handle->next->next;
+ free ( handle->next );
+ handle->next = t;
+ }
free(handle);
Regards.
Mark K Vallevand
We old folks have to find our cushions and pillows in our tankards.
Strong beer is the milk of the old.
- Martin Luther
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.
-----Original Message-----
From: uclibc-bounces at uclibc.org [mailto:uclibc-bounces at uclibc.org] On
Behalf Of Vallevand, Mark K
Sent: Wednesday, September 24, 2008 9:53 AM
To: uclibc at uclibc.org
Subject: RE: libdl usage count wrapping
I'd be interested to see if you see memory leaking as this program runs.
I saw a leak of 24 bytes per dlopen() dlclose(). So, after 32k
iterations it should have grown by 750k or more. A couple of ps
commands should show its growth.
Regards.
Mark K Vallevand
We old folks have to find our cushions and pillows in our tankards.
Strong beer is the milk of the old.
- Martin Luther
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.
-----Original Message-----
From: Kevin Day [mailto:thekevinday at gmail.com]
Sent: Tuesday, September 23, 2008 5:05 PM
To: carmelo73 at gmail.com
Cc: Vallevand, Mark K; uclibc at uclibc.org
Subject: Re: libdl usage count wrapping
On Tue, Sep 23, 2008 at 3:06 PM, Carmelo Amoroso <carmelo73 at gmail.com>
wrote:
> I'll look at these two issues soon.
>
> Thanks,
> Carmelo
> Vallevand, Mark K wrote:
>> Wow. I just ran into this problem. Or, something very similar. I
>> reported a memory leak in dlopen() dlclose() last week. I've got a
fix
>> for that problem, and my program doesn't leak any more. But, now its
>> crashing consistently after a period of time. The program makes
heavy
>> use of dlopen() dlclose().
>>
>> Looking at dlopen() dlclose(), I'm probably not going to look for
>> another fix there. I'm going to fix my program to dlopen() once and
>> leave libraries open.
>>
>> Regards.
>> Mark K Vallevand
>>
>> We old folks have to find our cushions and pillows in our tankards.
>> Strong beer is the milk of the old.
>> - Martin Luther
>>
>>
>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE
PROPRIETARY
>> MATERIAL and is thus for use only by the intended recipient. If you
>> received this in error, please contact the sender and delete the
e-mail
>> and its attachments from all computers.
>>
>>
>> -----Original Message-----
>> From: uclibc-bounces at uclibc.org [mailto:uclibc-bounces at uclibc.org] On
>> Behalf Of Phil Estes
>> Sent: Monday, September 22, 2008 9:45 PM
>> To: uclibc at uclibc.org
>> Subject: libdl usage count wrapping
>>
>> Recently I was looking into an issue where someone was claiming pam
was
>> segfaulting after a lot of usage (lots of calls to authenticate
>> users--usually around 5K calls). My investigation led me to the
point
>> where I realized that the dlopen() and dlclose() management of ref.
>> counting is not balanced, which leads to the heaviest "DL_NEEDED"
>> libraries basically getting incremented to the point of overflowing
>> "unsigned short usage_count". This leads to a nasty situation where
>> libc.so is munmapped (because usage_count == 0), and the next call to
a
>> C runtime function traps, of course.
>>
>> Since I was working with 0.9.29 snapshot from 2006, I decided to see
if
>> anything in SVN has changed that might impact this. I was interested
to
>> find the 17530 changeset and accompanying discussion
>> ( http://uclibc.org/lists/uclibc/2007-January/017165.html ) and while
>> testing it, noted a 10x improvement, given that more of the dependent
>> libs are added to the list that is walked at do_dlclose that includes
a
>> decrement of usage_count. However, it is still not exact in that
>> libc.so's usage_count continues to rise even with matching dlopen()
and
>> dlclose() calls each iteration through the example program (I'll
attach
>> below), and now needs 65K iterations of loading/unloading a lib to
get a
>> segfault.
>>
>> One way to 'watch' this in gdb is to add a breakpoint in libdl.c
around
>> line 247 or so and use the commands interface to do the following
>> output:
>> commands <brnum>
>> silent
>> printf "%d : %s\n",(*tpnt1)->usage_count, lpntstr
>> cont
>> end
>>
>> Now continue and watch libc's usage_count climb, and if you are
patient
>> enough, you will get a segfault somewhere after 65,500 iterations.
>>
>> Obviously one potential fix is to "handle" the wrap of usage_count in
>> ldso/dl-elf.c by checking for 0 after the increment and setting to
"near
>> max" ..which would then never allow a wrap to zero to occur which
causes
>> the segfault.
>>
>> However, it would be interesting to know if the uClibc maintainers
think
>> usage_count is important enough for some of the core libs to either
(a)
>> be correct, or (b) be protected from the segfault condition which is
>> less likely than it used to be given the aforementioned changes, but
>> still potential for long-running apps (like pam running on a system
with
>> a very long uptime). I'm slightly interested in trying to fix, but
>> given I'm no expert on all the various lists and pointers employed
via
>> dlopen() it seems like someone with some skill in the area should
make
>> sure it's done right. My hunch is that is has to do with the
init_fini
>> list creation and the filtering out of RTLD_GLOBAL libs, but I'm not
>> sure yet without more debug...but the init_fini list seems to be what
is
>> walked at dlclose that has any relation to decrementing usage_count.
>>
>> Thanks for any thoughts/input,
>> Phil Estes
>> estesp at linux.vnet.ibm.com
>>
>> Here's my example program for creating the segfault condition..it can
>> obviously be doctored to load different libs, more libs, less libs,
etc.
>>
>> ---stress-dlopen.c----
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <dlfcn.h>
>>
>> #define NUMLIBS 4
>> #define ITERS 50000
>>
>> void *handles[NUMLIBS];
>> /*
>> char *names[NUMLIBS] = { "/lib/security/pam_deny.so",
>> "/lib/security/pam_warn.so",
>> "/lib/libpam.so.0" };
>> */
>> char *names[NUMLIBS] = { "/usr/lib/libxml2.so.2",
>> "/usr/lib/libpcap.so",
>> "/usr/lib/libpng.so.2",
>> "/usr/lib/libxslt.so.1" };
>>
>> int openlibs()
>> {
>> int i, errors=0;
>> for (i = 0; i < NUMLIBS; i++) {
>> handles[i] = dlopen(names[i], RTLD_NOW);
>> if (handles[i] == 0) {
>> errors++;
>> fprintf(stderr, "%s\n", dlerror());
>> }
>> }
>> return errors;
>> }
>>
>> int closelibs()
>> {
>> int i;
>> for (i = 0; i < NUMLIBS; i++) {
>> if (handles[i] != 0) {
>> dlclose(handles[i]);
>> handles[i] = NULL;
>> }
>> }
>> return 0;
>> }
>>
>> int main()
>> {
>> unsigned int retcode = 0, i = 0, cnt = ITERS;
>>
>> for(i=0;i<cnt;i++)
>> {
>> printf("\n Call ... %d/%d\n",i,cnt);
>> retcode = openlibs();
>> printf("opened %d\n",retcode);
>> retcode = closelibs();
>> printf("closed %d\n",retcode);
>>
>> }
>> return 0;
>>
>> }
>>
>>
>>
>>
>> _______________________________________________
>> uClibc mailing list
>> uClibc at uclibc.org
>> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>> _______________________________________________
>> uClibc mailing list
>> uClibc at uclibc.org
>> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>>
>
> _______________________________________________
> uClibc mailing list
> uClibc at uclibc.org
> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>
I felt the need to test this on my uClibc 0.9.28.3
and got the following segfault:
Call ... 32766/50000
File not found
opened 1
closed 0
Call ... 32767/50000
File not found
opened 1
Segmentation fault (core dumped)
What's particularly interesting is 32767 is just under the magical
32768.
Makes me want to the this is an integer overflow issue.
Possibly a signed integer somewhere, and if this is changed to an
unsigned that would be 65534 (I am assuming)
This would match the 65k issue from 0.9.29 svn.
--
Kevin Day
--
Kevin Day
_______________________________________________
uClibc mailing list
uClibc at uclibc.org
http://busybox.net/cgi-bin/mailman/listinfo/uclibc
More information about the uClibc
mailing list