[uClibc 0001013]: pthread_cancel/pthread_join sequence hangs when using select in an other thread

bugs at busybox.net bugs at busybox.net
Wed Jul 23 13:32:07 UTC 2008


A NOTE has been added to this issue. 
====================================================================== 
http://busybox.net/bugs/view.php?id=1013 
====================================================================== 
Reported By:                jalaber
Assigned To:                uClibc
====================================================================== 
Project:                    uClibc
Issue ID:                   1013
Category:                   Posix Threads
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     assigned
====================================================================== 
Date Submitted:             08-31-2006 09:04 PDT
Last Modified:              07-23-2008 06:32 PDT
====================================================================== 
Summary:                    pthread_cancel/pthread_join sequence hangs when
using select in an other thread
Description: 
Hello,

I have found a very strange bug in uClibc using
pthread_cancel/pthread_join. 

My test program launches 1 thread which basically makes a select call with
a struct timeval set to 600ms. Then the main thread calls pthread_cancel
and pthread_join, followed by a printf. The program hangs. 
However if you remove the printf call, then the program terminates
normally. I have tried to replace the select call with a sem_wait call,
and everything works fine with or without printf. So the problem seems to
happen only with select.

I use buildroot with kernel 2.4.28 and uclibc 0.9.28. I have attached the
program to reproduce. If you try to comment the printf("join OK\n"), it
works for me.

Thank you for your time and help,
Philippe.

====================================================================== 

---------------------------------------------------------------------- 
 dwagner - 10-23-06 11:43  
---------------------------------------------------------------------- 
I think this issue is responsible that the LIRC driver of directfb RC1 does
not terminate. The driver uses select() and
pthread_cancel()/pthread_join().

Please fix that. 

---------------------------------------------------------------------- 
 vapier - 11-16-06 23:07  
---------------------------------------------------------------------- 
here's a tip ... saying things like "Please fix that." makes people think
"Fix it your goddamn self."

the hang may be because of the IO mutex being held by the canceled thread
... if you turn on PDEBUG in libpthread/linuxthreads/debug.h, that may
give you helpful output 

---------------------------------------------------------------------- 
 chombourger - 06-27-07 12:58  
---------------------------------------------------------------------- 
I tried to reproduce this issue on uclibc 0.9.29 running on a PC with a 2.6
linux kernel and your program is still running as I am typing these lines!
Is it what you are getting? Running the same program on the host (compiled
and linked against glibc worked). I will now try to enable the debug traces
to see if that helps. 

---------------------------------------------------------------------- 
 chombourger - 06-27-07 13:15  
---------------------------------------------------------------------- 
traces with debug enabled in linuxthreads.old:

26294 : __pthread_initialize_manager: manager stack: size=8160,
bos=0x804a150, tos=0x804c130
26294 : __pthread_initialize_manager: send REQ_DEBUG to manager thread
26294 : pthread_create: write REQ_CREATE to manager thread
26294 : pthread_create: before suspend(self)
26295 : __pthread_manager: before poll
26295 : __pthread_manager: after poll
26295 : __pthread_manager: before __libc_read
26295 : __pthread_manager: after __libc_read, n=148
26295 : __pthread_manager: got REQ_CREATE
26295 : pthread_handle_create: cloning new_thread = 0xbf1ffe20
26295 : pthread_handle_create: new thread pid = 26296
26295 : __pthread_manager: restarting -1208466944
26294 : pthread_create: after suspend(self)
26295 : __pthread_manager: before poll
26296 : pthread_start_thread:
step 0
step 1
step 2
26295 : __pthread_manager: after poll
26295 : __pthread_manager: before poll
step 3
cancel th...
26294 : pthread_cancel: sending cancel signal to 26296
26294 : pthread_cancel: kill returned 0 

---------------------------------------------------------------------- 
 chombourger - 06-30-07 14:22  
---------------------------------------------------------------------- 
It seems that the created thread has no jmpbuf when
pthread_handle_sigcancel() is called in the created thread and the signal
handler returns and the thread was not rerouted.

select() does not behave like a cancellation point (while it should).
Could it be because select() is simply a syscall5 and we therefore never
reach the sigwait() function of the pthread library?

If I modify the select() call as follow, the thread is indeed canceled:

r = select(0, NULL, NULL, &tv);
if ((r == -1) && (errno == EINTR)) pthread_testcancel();

I eventually found where linuxthreads.old defines cancellable system
calls: wrapsyscall.c and added an entry for select(2). 

Note: select() was previously listed as a cancellation point but it got
removed by Ulrich Depper and I don't know why:

CVSROOT:	/cvs/glibc
Module name:	libc
Changes by:	drepper at sourceware.org	2002-12-15 13:43:25

Modified files:
	linuxthreads   : wrapsyscall.c 

Log message:
	Remove creat, poll, pselect, readv, select, sigpause, sigsuspend,
	sigwaitinfo, waitid, and writev wrappers.

I have attached to this report, a patch re-introducing select(2).

 

---------------------------------------------------------------------- 
 hmoffatt - 09-04-07 21:25  
---------------------------------------------------------------------- 
I have an application which is hanging with uClibc 0.9.29. The main process
is regularly calling fork() and exec(). There is also a thread which is
doing a select() with a 1ms timeout in an endless loop. The application
sometimes hangs just after the fork/exec; the exec has happened (there is
a zombie process left around).

I tried the patch in this bug report; now the program segfaults instead of
hanging. So I don't think the patch is the correct solution. 

---------------------------------------------------------------------- 
 chombourger - 09-05-07 00:30  
---------------------------------------------------------------------- 
Three questions:

   (a) have you tried running this program on a glibc system and does it
work?
   (b) is your app. making any use of pthread_cancel() and
pthread_join()?
   (c) can you provide us a stripped down version of your application so
that we can reproduce the bug/segfault with it? 

---------------------------------------------------------------------- 
 hmoffatt - 09-05-07 02:58  
---------------------------------------------------------------------- 
My application is in Python. At the time my hang occurs I am not using
pthread_join or cancel; a finite set of threads have been created some
time earlier and should continue to exist for the life of the program.

Hence the original problem in this report doesn't describe my situation.
Nonetheless the patch had some impact which suggests it may not be right.

My original process (not one of the threads) is regularly calling fork()
and exec() (via python wrappers). It appears that the process hangs
somewhere after exec(), before returning to my interpreted program. strace
shows that the program did get SIGCHLD as the last thing that happened,
meaning that the child has exited. Then it seems to be sleeping waiting
for something to happen.

There is another thread which is calling select() with a 1ms sleep
indefinitely. I think it hangs also though I will have to retest to be
sure.

The thread manager thread seems to be still running ok, calling poll()
with a 1 second timeout. strace shows it is still running.

When I put in the patch from this report, the select() thread dies with
SIGSEGV.

I am trying to build glibc for the embedded system now. I will also try to
run it on my desktop with glibc and reproduce it. 

---------------------------------------------------------------------- 
 hmoffatt - 09-05-07 21:58  
---------------------------------------------------------------------- 
I have now tested this with glibc 2.3.6 with linuxthreads (not NPTL). It
segfaults just the same as with your patch for uClibc.

I guess that could be considered a positive sign.. probably meaning that
my bug is somewhere else. 

---------------------------------------------------------------------- 
 chombourger - 09-06-07 01:14  
---------------------------------------------------------------------- 
Ok interesting point. Do you have a backtrace when it crashes? Wondering if
the segfaults occurs within the libc. 

---------------------------------------------------------------------- 
 hmoffatt - 09-06-07 07:45  
---------------------------------------------------------------------- 
My cross-gdb insists on trying to load the host libraries rather than the
target ones so I can't get a meaningful back trace even though I have the
build and a core file. :( 

Looks like it is reading the absolute paths from the core file; I really
need to be able to prepend a path to those when loading into the
cross-gdb. Is that possible?

I have had even less success doing a live cross-gdb against gdbserver. 

---------------------------------------------------------------------- 
 chombourger - 09-06-07 07:51  
---------------------------------------------------------------------- 
You could use the gdb setting 'set solib-absolute-prefix PATH'
to tell gdb the base prefix of your target file-system

That used to work for me. Hope this helps! 

---------------------------------------------------------------------- 
 hmoffatt - 09-06-07 16:27  
---------------------------------------------------------------------- 
Great, solib-absolute-prefix was exactly what I needed.

I got the following when tracing against glibc; I don't have a build with
the latest uclibc ready to test at the moment.

Core was generated by `/usr/bin/python
/opt/calyptech/lib/webserver/server.py'.
Program terminated with signal 11, Segmentation fault.
http://busybox.net/bugs/view.php?id=0  0x4021b8ec in sem_wait () from
/home/hamish/work/robots/glibc-romfs/lib/libpthread.so.0
(gdb) 
(gdb) bt
http://busybox.net/bugs/view.php?id=0  0x4021b8ec in sem_wait () from
/home/hamish/work/robots/glibc-romfs/lib/libpthread.so.0
http://busybox.net/bugs/view.php?id=1  0xbe5ff54c in ?? ()

gdb is insisting that my core file does not match the python binary though
so it may be confused. I'm sure that it does (I copied the binary back out
of the embedded system). I'm not sure if this is related to threads or
not. 

---------------------------------------------------------------------- 
 thomask - 07-23-08 06:32  
---------------------------------------------------------------------- 
Is there any update on this bug? Looks like it's still there. 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
08-31-06 09:04  jalaber        New Issue                                    
08-31-06 09:04  jalaber        Status                   new => assigned     
08-31-06 09:04  jalaber        Assigned To               => uClibc          
08-31-06 09:04  jalaber        File Added: pthread_join_test.c                  
 
10-23-06 11:41  dwagner        Issue Monitored: dwagner                     
10-23-06 11:43  dwagner        Note Added: 0001715                          
11-16-06 23:07  vapier         Note Added: 0001744                          
06-27-07 12:58  chombourger    Note Added: 0002526                          
06-27-07 13:15  chombourger    Note Added: 0002527                          
06-27-07 13:26  chombourger    Note Added: 0002528                          
06-27-07 13:52  chombourger    Note Edited: 0002528                         
06-30-07 12:49  chombourger    Note Edited: 0002528                         
06-30-07 14:18  chombourger    File Added:
uClibc-select-cancellation-point.patch                    
06-30-07 14:22  chombourger    Note Edited: 0002528                         
07-01-07 22:26  chombourger    Issue Monitored: chombourger                    
09-04-07 21:25  hmoffatt       Note Added: 0002712                          
09-05-07 00:30  chombourger    Note Added: 0002713                          
09-05-07 02:58  hmoffatt       Note Added: 0002714                          
09-05-07 21:58  hmoffatt       Note Added: 0002721                          
09-06-07 01:14  chombourger    Note Added: 0002722                          
09-06-07 07:45  hmoffatt       Note Added: 0002723                          
09-06-07 07:51  chombourger    Note Added: 0002724                          
09-06-07 16:27  hmoffatt       Note Added: 0002725                          
07-23-08 06:32  thomask        Note Added: 0009924                          
======================================================================




More information about the uClibc-cvs mailing list