init not processing SIGCHLD during reboot?

Paul Smith psmith at netezza.com
Fri Dec 1 22:35:18 UTC 2006


On Thu, 2006-11-30 at 00:44 +0100, Denis Vlasenko wrote:
> On Wednesday 29 November 2006 19:35, Paul Smith wrote:
> > 
> > This works fine if I run that command from the shell, but it fails
> > during a real reboot.  Annotating the script I can see that when I kill
> > the process it dies, but it's still there as a zombie (Z), so pidof
> > still finds it and thinks that it hasn't died yet.
> > 
> > I'm assuming that while busybox init is running my script, it's not
> > handling SIGCHLD signals and cleaning up zombies like it usually does,
> > so that's why my process is never reaped.
> > 
> > 
> > Now, of course I could make my script more complex and use ps with awk
> > or sed to find the pids of only non-zombie versions of my daemons, but
> > it seems like init should not leave zombies around like this.  There are
> > a number of options, of course: the simplest one might be to set SIG_IGN
> > on SIGCHLD before init starts shutdown processing.  Because we're
> > rebooting anyway we don't really need to handle SIGCHLD anymore, and
> > SIG_IGN will let the kernel clean up the process immediately.
> 
> Please try it, and send a patch.

Of course, once you look at the code life gets more complex :).  The
loop in init starts any unstarted  RESPAWN and ASKFIRST commands, then
it sleeps for one second, then it does a wait() to sleep until one of
its children dies.  Once that happens it reaps all the children that
have died: for each one it sets the PID to 0, then it goes back to the
beginning of the loop.  Since the PID is 0, the RESPAWN and ASKFIRST
commands that exited will be restarted.  Then we sleep for 1 second,
wait for a child death, reap children, and on and on.

So, the problem I'm seeing is because the same run() function is used to
start a command regardless of why it's being started (the state of the
system).  This is nice, but the standard run() command will block all
signals, including SIGCHLD, while the command is being run.  This makes
sense normally, since you don't want to miss reaping one child while
you're starting another one.

But in the case of shutdown and ctlaltdel and restart, this doesn't make
sense since you're exiting anyway.

So, my proposed change is to modify init.c:run() so that if the action
is SHUTDOWN, CTLALTDEL, or RESTART, we don't block SIGCHLD before we
fork the action.

I'll make the change and test it; if anyone thinks of a reason this is
wrong or ill-behaved let me know.

Cheers!

PS. What's the portability requirements for BB?  I notice that we're
using SIG_IGN for SIGCHLD: in the original POSIX spec this was
undefined... you had to install a handler to wait() or else you got
zombies.  In the newer POSIX/SUS specs SIG_IGN on SIGCHLD will have the
kernel release the zombie without needing a wait().  I suppose if it's
working for current BB users, we should leave well enough alone.

-- 
-----------------------------------------------------------------------------
 Paul D. Smith <psmith at netezza.com>                       http://netezza.com
 "Please remain calm--I may be mad, but I am a professional."--Mad Scientist
-----------------------------------------------------------------------------
      These are my opinions--Netezza takes no responsibility for them.



More information about the busybox mailing list