[PATCH] init: handle kexec clean reboot

Laurent Bercot ska-dietlibc at skarnet.org
Thu May 23 13:02:49 UTC 2013


> "kill -STOP 1" then :) Yes, it works even on PID 1. :)

 Yeah, looks like it does on Linux. Mind if I still find this ugly ?


>> necessary to disable supervision. If the supervision chain is rooted in
>> process 1 as it should be,
> 
> Who said "it should be"? I disagree that it's best to run supervisor
> as PID 1. Wait till a bug in the supervisor kills it and hangs
> your box.

 I said "rooted in process 1", I didn't say that all the supervision
logic needed to be in process 1. If your supervision chain isn't rooted
in process 1, you can still hose your system with random SIGKILLS - as
sent by the OOM killer, for instance.
 That doesn't mean process 1 should be complex. It only needs to monitor
at least *one* other process - which is exactly what runit does.
busybox init and s6-svscan monitor several processes, they all qualify
as root of the supervision chain, too, and they're still very simple.
Don't worry, we're on the same side here.


>> this can only be done by signalling process 1 in some way.
> Yes. SIGSTOP is the signal you look for :)

 So we have two approaches here.

 The first one:
 * stop process 1
 * scan processes to kill everything except my own session, so my calling
shell survives.
   - this requires stopping all processes during the scan, to avoid race
conditions
   - but I want my calling shell to resume after I'm done, so I'll send
SIGCONT to everyone
   - oops, I don't want to include process 1 in my SIGCONT, else it will
restart stuff I don't want restarted
   - so I need to send SIGCONT to everyone except process 1, which
requires doing it by hand
   - after a couple hundred system calls, I can exit and return control
to my calling shell

 The second one:
 * have process 1 listen to a signal (or any kind of message) that tells
it "going to shutdown".
 * process 1 stops supervising services and executes into a shutdown
script, which can be provided by the admin.
 * said shutdown script runs as pid 1, so it can kill -9 -1 without
any further questions. One system call -> clean slate.

 Call me stubborn, but I still think the second approach is cleaner.
s6 has been designed to work like this and it has given me the fastest
shutdown/boot times of all the init systems I've tried - and I've tried
a few.

 Additionally, this approach only requires that kill(-1, something)
doesn't hit process 1, which is the case with virtually every Unix
kernel out there. Why Linux chose to do things differently makes no
sense to me: process 1 is still special with Linux since killing it
makes your kernel panic. Why choose to protect *the sending process*
from signals sent to everyone, instead of protecting *process 1*
which is the special, vital one ? This screams "half-thought design".

 Ah well. If we disagree on this, we can still agree that systemd sucks. ;)

-- 
 Laurent


More information about the busybox mailing list